CVE-2026-31222 — AWAITING NVD AI Security Vulnerability

CISO Take

The snorkel weak supervision library (through v0.10.0) allows arbitrary code execution when loading model checkpoint files because its Trainer.load() method calls torch.load() without the weights_only=True safety flag, enabling unrestricted Pickle deserialization of attacker-controlled Python objects. Any ML team that loads snorkel checkpoints from shared repositories, collaborative training environments, or external sources is exposed to full host compromise with no interaction beyond opening a file. No public exploit or active exploitation is confirmed and CISA has not added this to KEV, but the attack class is mature and trivially weaponizable with standard Python tooling such as fickling. Immediate action: avoid calling Trainer.load() with files from untrusted sources, enforce internal-only artifact registries, and scan existing checkpoints with fickling pending an official patch.

Sources: NVD ATLAS

Risk Assessment

Risk is HIGH despite missing CVSS score. Insecure Pickle deserialization is a well-documented attack class with ready-made tooling, placing the technical barrier to exploit at near-zero for any attacker who can deliver a crafted checkpoint. Exposure is concentrated in data science and ML engineering environments where model sharing is routine and checkpoint provenance is rarely enforced. The absence of active exploitation and the niche footprint of snorkel (versus LangChain or transformers) moderate immediate urgency, but the RCE impact on training infrastructure is maximum severity when triggered.

Attack Kill Chain

Artifact Delivery

Adversary crafts a malicious snorkel model checkpoint embedding a Pickle payload and delivers it via shared model registry, GitHub PR attachment, or direct transfer to the target ML team.

AML.T0011.000

Deserialization Trigger

Victim calls Trainer.load() on the crafted checkpoint; snorkel invokes torch.load() without weights_only=True, triggering unrestricted Pickle deserialization of adversary-controlled Python objects.

AML.T0018.002

Code Execution

The embedded Pickle __reduce__ payload executes arbitrary Python code under the victim process's privileges, establishing attacker control over the training host.

AML.T0112.001

Impact

Attacker exfiltrates training datasets, cloud credentials, and API keys from the compromised ML environment, or installs a persistent backdoor on training infrastructure for ongoing access.

AML.T0025

Artifact Delivery

Adversary crafts a malicious snorkel model checkpoint embedding a Pickle payload and delivers it via shared model registry, GitHub PR attachment, or direct transfer to the target ML team.

AML.T0011.000

Deserialization Trigger

Victim calls Trainer.load() on the crafted checkpoint; snorkel invokes torch.load() without weights_only=True, triggering unrestricted Pickle deserialization of adversary-controlled Python objects.

AML.T0018.002

Code Execution

The embedded Pickle __reduce__ payload executes arbitrary Python code under the victim process's privileges, establishing attacker control over the training host.

AML.T0112.001

Impact

Attacker exfiltrates training datasets, cloud credentials, and API keys from the compromised ML environment, or installs a persistent backdoor on training infrastructure for ongoing access.

AML.T0025

Severity & Risk

CVSS 3.1

N/A

EPSS

N/A

Exploitation Status

No known exploitation

Sophistication

Trivial

Recommended Action

6 steps

Monitor snorkel-team/snorkel releases for a patch; upgrade immediately when available.
As an immediate workaround, prohibit calls to Trainer.load() with checkpoint files from any source outside your internal, access-controlled artifact registry.
Enforce artifact provenance — all model checkpoints must originate from internal registries (MLflow Model Registry, DVC remote, private S3/GCS with strict ACLs).
Scan existing checkpoint files with fickling: pip install fickling && python -m fickling --check-safety <checkpoint.pkl> — malicious payloads will be flagged.
In environments where snorkel cannot be removed, wrap torch.load() calls in a subprocess sandbox or use safetensors format for model weights where feasible.
Monitor training job logs for anomalous subprocess spawning, unexpected network connections, or file writes during checkpoint loading.

Classification

Code Execution Supply Chain Model Training Data AML.T0010.001 - AI Software AML.T0011.000 - Unsafe AI Artifacts AML.T0018.002 - Embed Malware AML.T0112.001 - AI Artifacts

Compliance Impact

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.6.1.2 - Information security risk assessment A.8.4 - AI system security

NIST AI RMF

MANAGE 2.2 - Treatments, responses, and prioritization of AI risks

OWASP LLM Top 10

LLM03:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2026-31222?

The snorkel weak supervision library (through v0.10.0) allows arbitrary code execution when loading model checkpoint files because its Trainer.load() method calls torch.load() without the weights_only=True safety flag, enabling unrestricted Pickle deserialization of attacker-controlled Python objects. Any ML team that loads snorkel checkpoints from shared repositories, collaborative training environments, or external sources is exposed to full host compromise with no interaction beyond opening a file. No public exploit or active exploitation is confirmed and CISA has not added this to KEV, but the attack class is mature and trivially weaponizable with standard Python tooling such as fickling. Immediate action: avoid calling Trainer.load() with files from untrusted sources, enforce internal-only artifact registries, and scan existing checkpoints with fickling pending an official patch.

Is CVE-2026-31222 actively exploited?

No confirmed active exploitation of CVE-2026-31222 has been reported, but organizations should still patch proactively.

How to fix CVE-2026-31222?

1. Monitor snorkel-team/snorkel releases for a patch; upgrade immediately when available. 2. As an immediate workaround, prohibit calls to Trainer.load() with checkpoint files from any source outside your internal, access-controlled artifact registry. 3. Enforce artifact provenance — all model checkpoints must originate from internal registries (MLflow Model Registry, DVC remote, private S3/GCS with strict ACLs). 4. Scan existing checkpoint files with fickling: `pip install fickling && python -m fickling --check-safety <checkpoint.pkl>` — malicious payloads will be flagged. 5. In environments where snorkel cannot be removed, wrap torch.load() calls in a subprocess sandbox or use safetensors format for model weights where feasible. 6. Monitor training job logs for anomalous subprocess spawning, unexpected network connections, or file writes during checkpoint loading.

What systems are affected by CVE-2026-31222?

This vulnerability affects the following AI/ML architecture patterns: Training pipelines, Weak supervision pipelines, ML experiment environments, Shared GPU clusters, CI/CD ML training jobs.

What is the CVSS score for CVE-2026-31222?

No CVSS score has been assigned yet.

Technical Details

NVD Description

The snorkel library thru v0.10.0 contains an insecure deserialization vulnerability (CWE-502) in the Trainer.load() method of the Trainer class. The method loads model checkpoint files using torch.load() without enabling the security-restrictive weights_only=True parameter. This default behavior allows the deserialization of arbitrary Python objects via the Pickle module. A remote attacker can exploit this by providing a maliciously crafted model file, leading to arbitrary code execution on the victim's system when the file is loaded via the vulnerable method.

Exploitation Scenario

An adversary targeting an ML engineering team publishes a poisoned snorkel checkpoint to a shared model registry or sends it via a collaborative channel (Slack, GitHub PR attachment, email). The victim data scientist calls Trainer.load('checkpoint.pt') to resume training. Snorkel invokes torch.load() on the crafted file, triggering unrestricted Pickle deserialization. A __reduce__ payload in the file spawns a reverse shell or drops a cron-based backdoor running as the data science user. From this foothold the attacker exfiltrates training datasets, AWS/GCP credentials from environment variables, API keys, and pivots to cloud storage buckets attached to the training environment — all without any privileged access being required.