CVE-2026-31222: snorkel: RCE via insecure model checkpoint loading

GHSA-78cp-f66x-qmh5 HIGH
Published May 12, 2026
CISO Take

The snorkel weak supervision library (through v0.10.0) allows arbitrary code execution when loading model checkpoint files because its Trainer.load() method calls torch.load() without the weights_only=True safety flag, enabling unrestricted Pickle deserialization of attacker-controlled Python objects. Any ML team that loads snorkel checkpoints from shared repositories, collaborative training environments, or external sources is exposed to full host compromise with no interaction beyond opening a file. No public exploit or active exploitation is confirmed and CISA has not added this to KEV, but the attack class is mature and trivially weaponizable with standard Python tooling such as fickling. Immediate action: avoid calling Trainer.load() with files from untrusted sources, enforce internal-only artifact registries, and scan existing checkpoints with fickling pending an official patch.

Sources: NVD ATLAS

What is the risk?

Risk is HIGH despite missing CVSS score. Insecure Pickle deserialization is a well-documented attack class with ready-made tooling, placing the technical barrier to exploit at near-zero for any attacker who can deliver a crafted checkpoint. Exposure is concentrated in data science and ML engineering environments where model sharing is routine and checkpoint provenance is rarely enforced. The absence of active exploitation and the niche footprint of snorkel (versus LangChain or transformers) moderate immediate urgency, but the RCE impact on training infrastructure is maximum severity when triggered.

How does the attack unfold?

Artifact Delivery
Adversary crafts a malicious snorkel model checkpoint embedding a Pickle payload and delivers it via shared model registry, GitHub PR attachment, or direct transfer to the target ML team.
AML.T0011.000
Deserialization Trigger
Victim calls Trainer.load() on the crafted checkpoint; snorkel invokes torch.load() without weights_only=True, triggering unrestricted Pickle deserialization of adversary-controlled Python objects.
AML.T0018.002
Code Execution
The embedded Pickle __reduce__ payload executes arbitrary Python code under the victim process's privileges, establishing attacker control over the training host.
AML.T0112.001
Impact
Attacker exfiltrates training datasets, cloud credentials, and API keys from the compromised ML environment, or installs a persistent backdoor on training infrastructure for ongoing access.
AML.T0025

What systems are affected?

Package Ecosystem Vulnerable Range Patched
PyTorch pip <= 0.10.0 No patch
100.9K OpenSSF 6.4 22.7K dependents Pushed 5d ago 11% patched ~216d to patch Full package profile →
snorkel No patch

How severe is it?

CVSS 3.1
8.8 / 10
EPSS
0.4%
chance of exploitation in 30 days
Higher than 31% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Trivial

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI Required
S Unchanged
C High
I High
A High

What should I do?

6 steps
  1. Monitor snorkel-team/snorkel releases for a patch; upgrade immediately when available.

  2. As an immediate workaround, prohibit calls to Trainer.load() with checkpoint files from any source outside your internal, access-controlled artifact registry.

  3. Enforce artifact provenance — all model checkpoints must originate from internal registries (MLflow Model Registry, DVC remote, private S3/GCS with strict ACLs).

  4. Scan existing checkpoint files with fickling: pip install fickling && python -m fickling --check-safety <checkpoint.pkl> — malicious payloads will be flagged.

  5. In environments where snorkel cannot be removed, wrap torch.load() calls in a subprocess sandbox or use safetensors format for model weights where feasible.

  6. Monitor training job logs for anomalous subprocess spawning, unexpected network connections, or file writes during checkpoint loading.

What does CISA's SSVC say?

Decision Track
Exploitation none
Automatable No
Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.1.2 - Information security risk assessment A.8.4 - AI system security
NIST AI RMF
MANAGE 2.2 - Treatments, responses, and prioritization of AI risks
OWASP LLM Top 10
LLM03:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2026-31222?

The snorkel weak supervision library (through v0.10.0) allows arbitrary code execution when loading model checkpoint files because its Trainer.load() method calls torch.load() without the weights_only=True safety flag, enabling unrestricted Pickle deserialization of attacker-controlled Python objects. Any ML team that loads snorkel checkpoints from shared repositories, collaborative training environments, or external sources is exposed to full host compromise with no interaction beyond opening a file. No public exploit or active exploitation is confirmed and CISA has not added this to KEV, but the attack class is mature and trivially weaponizable with standard Python tooling such as fickling. Immediate action: avoid calling Trainer.load() with files from untrusted sources, enforce internal-only artifact registries, and scan existing checkpoints with fickling pending an official patch.

Is CVE-2026-31222 actively exploited?

No confirmed active exploitation of CVE-2026-31222 has been reported, but organizations should still patch proactively.

How to fix CVE-2026-31222?

1. Monitor snorkel-team/snorkel releases for a patch; upgrade immediately when available. 2. As an immediate workaround, prohibit calls to Trainer.load() with checkpoint files from any source outside your internal, access-controlled artifact registry. 3. Enforce artifact provenance — all model checkpoints must originate from internal registries (MLflow Model Registry, DVC remote, private S3/GCS with strict ACLs). 4. Scan existing checkpoint files with fickling: `pip install fickling && python -m fickling --check-safety <checkpoint.pkl>` — malicious payloads will be flagged. 5. In environments where snorkel cannot be removed, wrap torch.load() calls in a subprocess sandbox or use safetensors format for model weights where feasible. 6. Monitor training job logs for anomalous subprocess spawning, unexpected network connections, or file writes during checkpoint loading.

What systems are affected by CVE-2026-31222?

This vulnerability affects the following AI/ML architecture patterns: Training pipelines, Weak supervision pipelines, ML experiment environments, Shared GPU clusters, CI/CD ML training jobs.

What is the CVSS score for CVE-2026-31222?

CVE-2026-31222 has a CVSS v3.1 base score of 8.8 (HIGH). The EPSS exploitation probability is 0.39%.

What is the AI security impact?

Affected AI Architectures

Training pipelinesWeak supervision pipelinesML experiment environmentsShared GPU clustersCI/CD ML training jobs

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0011.000 Unsafe AI Artifacts
AML.T0018.002 Embed Malware
AML.T0112.001 AI Artifacts

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: A.6.1.2, A.8.4
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM03:2025

What are the technical details?

Original Advisory

The snorkel library thru v0.10.0 contains an insecure deserialization vulnerability (CWE-502) in the Trainer.load() method of the Trainer class. The method loads model checkpoint files using torch.load() without enabling the security-restrictive weights_only=True parameter. This default behavior allows the deserialization of arbitrary Python objects via the Pickle module. A remote attacker can exploit this by providing a maliciously crafted model file, leading to arbitrary code execution on the victim's system when the file is loaded via the vulnerable method.

Exploitation Scenario

An adversary targeting an ML engineering team publishes a poisoned snorkel checkpoint to a shared model registry or sends it via a collaborative channel (Slack, GitHub PR attachment, email). The victim data scientist calls Trainer.load('checkpoint.pt') to resume training. Snorkel invokes torch.load() on the crafted file, triggering unrestricted Pickle deserialization. A __reduce__ payload in the file spawns a reverse shell or drops a cron-based backdoor running as the data science user. From this foothold the attacker exfiltrates training datasets, AWS/GCP credentials from environment variables, API keys, and pivots to cloud storage buckets attached to the training environment — all without any privileged access being required.

Weaknesses (CWE)

CWE-502 — Deserialization of Untrusted Data: The product deserializes untrusted data without sufficiently ensuring that the resulting data will be valid.

  • [Architecture and Design, Implementation] If available, use the signing/sealing features of the programming language to assure that deserialized data has not been tainted. For example, a hash-based message authentication code (HMAC) could be used to ensure that data has not been modified.
  • [Implementation] When deserializing data, populate a new object rather than just deserializing. The result is that the data flows through safe input validation and that the functions are safe.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

Timeline

Published
May 12, 2026
Last Modified
May 18, 2026
First Seen
May 12, 2026

Related Vulnerabilities