CVE-2026-31222: snorkel: RCE via insecure model checkpoint loading
AWAITING NVDThe snorkel weak supervision library (through v0.10.0) allows arbitrary code execution when loading model checkpoint files because its Trainer.load() method calls torch.load() without the weights_only=True safety flag, enabling unrestricted Pickle deserialization of attacker-controlled Python objects. Any ML team that loads snorkel checkpoints from shared repositories, collaborative training environments, or external sources is exposed to full host compromise with no interaction beyond opening a file. No public exploit or active exploitation is confirmed and CISA has not added this to KEV, but the attack class is mature and trivially weaponizable with standard Python tooling such as fickling. Immediate action: avoid calling Trainer.load() with files from untrusted sources, enforce internal-only artifact registries, and scan existing checkpoints with fickling pending an official patch.
Risk Assessment
Risk is HIGH despite missing CVSS score. Insecure Pickle deserialization is a well-documented attack class with ready-made tooling, placing the technical barrier to exploit at near-zero for any attacker who can deliver a crafted checkpoint. Exposure is concentrated in data science and ML engineering environments where model sharing is routine and checkpoint provenance is rarely enforced. The absence of active exploitation and the niche footprint of snorkel (versus LangChain or transformers) moderate immediate urgency, but the RCE impact on training infrastructure is maximum severity when triggered.
Attack Kill Chain
Severity & Risk
Recommended Action
6 steps-
Monitor snorkel-team/snorkel releases for a patch; upgrade immediately when available.
-
As an immediate workaround, prohibit calls to Trainer.load() with checkpoint files from any source outside your internal, access-controlled artifact registry.
-
Enforce artifact provenance — all model checkpoints must originate from internal registries (MLflow Model Registry, DVC remote, private S3/GCS with strict ACLs).
-
Scan existing checkpoint files with fickling:
pip install fickling && python -m fickling --check-safety <checkpoint.pkl>— malicious payloads will be flagged. -
In environments where snorkel cannot be removed, wrap torch.load() calls in a subprocess sandbox or use safetensors format for model weights where feasible.
-
Monitor training job logs for anomalous subprocess spawning, unexpected network connections, or file writes during checkpoint loading.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2026-31222?
The snorkel weak supervision library (through v0.10.0) allows arbitrary code execution when loading model checkpoint files because its Trainer.load() method calls torch.load() without the weights_only=True safety flag, enabling unrestricted Pickle deserialization of attacker-controlled Python objects. Any ML team that loads snorkel checkpoints from shared repositories, collaborative training environments, or external sources is exposed to full host compromise with no interaction beyond opening a file. No public exploit or active exploitation is confirmed and CISA has not added this to KEV, but the attack class is mature and trivially weaponizable with standard Python tooling such as fickling. Immediate action: avoid calling Trainer.load() with files from untrusted sources, enforce internal-only artifact registries, and scan existing checkpoints with fickling pending an official patch.
Is CVE-2026-31222 actively exploited?
No confirmed active exploitation of CVE-2026-31222 has been reported, but organizations should still patch proactively.
How to fix CVE-2026-31222?
1. Monitor snorkel-team/snorkel releases for a patch; upgrade immediately when available. 2. As an immediate workaround, prohibit calls to Trainer.load() with checkpoint files from any source outside your internal, access-controlled artifact registry. 3. Enforce artifact provenance — all model checkpoints must originate from internal registries (MLflow Model Registry, DVC remote, private S3/GCS with strict ACLs). 4. Scan existing checkpoint files with fickling: `pip install fickling && python -m fickling --check-safety <checkpoint.pkl>` — malicious payloads will be flagged. 5. In environments where snorkel cannot be removed, wrap torch.load() calls in a subprocess sandbox or use safetensors format for model weights where feasible. 6. Monitor training job logs for anomalous subprocess spawning, unexpected network connections, or file writes during checkpoint loading.
What systems are affected by CVE-2026-31222?
This vulnerability affects the following AI/ML architecture patterns: Training pipelines, Weak supervision pipelines, ML experiment environments, Shared GPU clusters, CI/CD ML training jobs.
What is the CVSS score for CVE-2026-31222?
No CVSS score has been assigned yet.
Technical Details
NVD Description
The snorkel library thru v0.10.0 contains an insecure deserialization vulnerability (CWE-502) in the Trainer.load() method of the Trainer class. The method loads model checkpoint files using torch.load() without enabling the security-restrictive weights_only=True parameter. This default behavior allows the deserialization of arbitrary Python objects via the Pickle module. A remote attacker can exploit this by providing a maliciously crafted model file, leading to arbitrary code execution on the victim's system when the file is loaded via the vulnerable method.
Exploitation Scenario
An adversary targeting an ML engineering team publishes a poisoned snorkel checkpoint to a shared model registry or sends it via a collaborative channel (Slack, GitHub PR attachment, email). The victim data scientist calls Trainer.load('checkpoint.pt') to resume training. Snorkel invokes torch.load() on the crafted file, triggering unrestricted Pickle deserialization. A __reduce__ payload in the file spawns a reverse shell or drops a cron-based backdoor running as the data science user. From this foothold the attacker exfiltrates training datasets, AWS/GCP credentials from environment variables, API keys, and pivots to cloud storage buckets attached to the training environment — all without any privileged access being required.
References
Timeline
Related Vulnerabilities
CVE-2025-59528 10.0 Flowise: Unauthenticated RCE via MCP config injection
Same attack type: Supply Chain CVE-2024-2912 10.0 BentoML: RCE via insecure deserialization (CVSS 10)
Same attack type: Supply Chain CVE-2023-3765 10.0 MLflow: path traversal allows arbitrary file read
Same attack type: Supply Chain CVE-2025-5120 10.0 smolagents: sandbox escape enables unauthenticated RCE
Same attack type: Supply Chain CVE-2026-21858 10.0 n8n: Input Validation flaw enables exploitation
Same attack type: Code Execution
AI Threat Alert