CVE-2026-31224: snorkel: RCE via unsafe model deserialization

GHSA-gpx5-7xm4-229w HIGH
Published May 12, 2026
CISO Take

The snorkel library through v0.10.0 uses torch.load() without the weights_only=True safety flag in its MultitaskClassifier.load() method, allowing arbitrary Python objects to be deserialized via the Pickle protocol and resulting in remote code execution on the loading host. Any ML pipeline that ingests snorkel model checkpoints from shared storage, model registries, or external sources faces full code execution exposure with the privileges of the training process — which typically includes access to cloud credentials, internal APIs, and sensitive training data. No public exploit or KEV entry exists yet, but pickle deserialization attacks are extensively documented and require only moderate skill to weaponize, making this a realistic threat for organizations with shared model artifact workflows. Until a patched release above v0.10.0 is available, restrict model loading to cryptographically verified internal sources and run picklescan or fickling over all model files before ingestion.

Sources: NVD ATLAS

What is the risk?

Medium-High risk. Insecure deserialization via Python's pickle protocol is a mature, well-understood attack class with published exploit tooling, lowering the bar for weaponization once a malicious model file can be delivered. The vulnerability grants full arbitrary code execution on the ML host, a high-impact outcome in enterprise environments where training infrastructure often holds elevated credentials and access to downstream systems. The absence of CVSS scoring reflects recency rather than low severity. Organizations running automated MLOps pipelines that load snorkel models without artifact integrity checks are the highest-exposure population.

How does the attack unfold?

Artifact Staging
Adversary crafts a malicious snorkel MultitaskClassifier model file embedding a Python pickle payload designed to execute arbitrary code upon deserialization.
AML.T0018.002
Supply Chain Delivery
Malicious model file is uploaded to a shared model registry, S3 bucket, or artifact repository accessible to the target ML pipeline, replacing or supplementing a legitimate checkpoint.
AML.T0010.003
Execution Trigger
Victim's training pipeline or data scientist calls MultitaskClassifier.load() on the malicious file, triggering torch.load() deserialization without weights_only restriction and executing the pickle payload.
AML.T0011.000
System Compromise
Pickle payload executes with ML process privileges, enabling credential theft, data exfiltration, reverse shell establishment, or lateral movement into connected ML infrastructure and data stores.
AML.T0072

What systems are affected?

Package Ecosystem Vulnerable Range Patched
PyTorch pip <= 0.10.0 No patch
100.9K OpenSSF 6.4 22.7K dependents Pushed 5d ago 11% patched ~216d to patch Full package profile →
snorkel No patch

How severe is it?

CVSS 3.1
8.8 / 10
EPSS
0.4%
chance of exploitation in 30 days
Higher than 31% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Moderate

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI Required
S Unchanged
C High
I High
A High

What should I do?

6 steps
  1. Upgrade snorkel to a version above v0.10.0 once a patched release is published; monitor the snorkel-team/snorkel GitHub repository for security advisories.

  2. Audit all torch.load() calls across your ML codebase and add weights_only=True to every call that does not require full object deserialization.

  3. Enforce SHA-256 hash or cryptographic signature verification on all model artifact files as part of your model registry intake process before loading.

  4. Integrate picklescan or fickling into your CI/CD pipeline to scan model files for malicious pickle payloads at artifact registration time.

  5. Restrict model loading to internally controlled, access-audited storage — do not allow training jobs to load models directly from public URLs or unverified external registries.

  6. Monitor ML training hosts for anomalous subprocess spawns, unexpected outbound network connections, or credential access events that could indicate post-exploitation activity.

What does CISA's SSVC say?

Decision Track
Exploitation none
Automatable No
Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2 - AI system lifecycle management A.9.4 - Security of AI systems
NIST AI RMF
MANAGE 2.2 - Risk response for AI supply chain
OWASP LLM Top 10
LLM05:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2026-31224?

The snorkel library through v0.10.0 uses torch.load() without the weights_only=True safety flag in its MultitaskClassifier.load() method, allowing arbitrary Python objects to be deserialized via the Pickle protocol and resulting in remote code execution on the loading host. Any ML pipeline that ingests snorkel model checkpoints from shared storage, model registries, or external sources faces full code execution exposure with the privileges of the training process — which typically includes access to cloud credentials, internal APIs, and sensitive training data. No public exploit or KEV entry exists yet, but pickle deserialization attacks are extensively documented and require only moderate skill to weaponize, making this a realistic threat for organizations with shared model artifact workflows. Until a patched release above v0.10.0 is available, restrict model loading to cryptographically verified internal sources and run picklescan or fickling over all model files before ingestion.

Is CVE-2026-31224 actively exploited?

No confirmed active exploitation of CVE-2026-31224 has been reported, but organizations should still patch proactively.

How to fix CVE-2026-31224?

1. Upgrade snorkel to a version above v0.10.0 once a patched release is published; monitor the snorkel-team/snorkel GitHub repository for security advisories. 2. Audit all torch.load() calls across your ML codebase and add weights_only=True to every call that does not require full object deserialization. 3. Enforce SHA-256 hash or cryptographic signature verification on all model artifact files as part of your model registry intake process before loading. 4. Integrate picklescan or fickling into your CI/CD pipeline to scan model files for malicious pickle payloads at artifact registration time. 5. Restrict model loading to internally controlled, access-audited storage — do not allow training jobs to load models directly from public URLs or unverified external registries. 6. Monitor ML training hosts for anomalous subprocess spawns, unexpected outbound network connections, or credential access events that could indicate post-exploitation activity.

What systems are affected by CVE-2026-31224?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, weak supervision workflows, MLOps pipelines, model serving, data labeling pipelines.

What is the CVSS score for CVE-2026-31224?

CVE-2026-31224 has a CVSS v3.1 base score of 8.8 (HIGH). The EPSS exploitation probability is 0.39%.

What is the AI security impact?

Affected AI Architectures

training pipelinesweak supervision workflowsMLOps pipelinesmodel servingdata labeling pipelines

MITRE ATLAS Techniques

AML.T0010.003 Model
AML.T0011.000 Unsafe AI Artifacts
AML.T0018.002 Embed Malware
AML.T0058 Publish Poisoned Models

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: A.6.2, A.9.4
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM05:2025

What are the technical details?

Original Advisory

The snorkel library thru v0.10.0 contains an insecure deserialization vulnerability (CWE-502) in the MultitaskClassifier.load() method of the MultitaskClassifier class. The method loads model weight files using torch.load() without enabling the security-restrictive weights_only=True parameter. This default behavior allows the deserialization of arbitrary Python objects via the Pickle module. A remote attacker can exploit this by providing a maliciously crafted model file, leading to arbitrary code execution on the victim's system when the file is loaded via the vulnerable method.

Exploitation Scenario

An adversary with write access to a shared model artifact store — obtained via compromised CI/CD credentials, a misconfigured S3 bucket policy, or a malicious insider — uploads a crafted snorkel MultitaskClassifier model file containing an embedded pickle payload. The payload is disguised as a legitimate model checkpoint and placed in a path expected by an automated training pipeline. When the pipeline calls MultitaskClassifier.load() during a scheduled training run, torch.load() deserializes the file without restriction, executing the payload with the ML process's permissions. The attacker receives a reverse shell or exfiltrated cloud credentials, establishing a foothold in the ML infrastructure with access to training data and downstream production systems.

Weaknesses (CWE)

CWE-502 — Deserialization of Untrusted Data: The product deserializes untrusted data without sufficiently ensuring that the resulting data will be valid.

  • [Architecture and Design, Implementation] If available, use the signing/sealing features of the programming language to assure that deserialized data has not been tainted. For example, a hash-based message authentication code (HMAC) could be used to ensure that data has not been modified.
  • [Implementation] When deserializing data, populate a new object rather than just deserializing. The result is that the data flows through safe input validation and that the functions are safe.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

Timeline

Published
May 12, 2026
Last Modified
May 18, 2026
First Seen
May 12, 2026

Related Vulnerabilities