The snorkel weak supervision library (through v0.10.0) allows arbitrary code execution when loading model checkpoint files because its Trainer.load() method calls torch.load() without the weights_only=True safety flag, enabling unrestricted Pickle deserialization of attacker-controlled Python objects. Any ML team that loads snorkel checkpoints from shared repositories, collaborative training environments, or external sources is exposed to full host compromise with no interaction beyond opening a file. No public exploit or active exploitation is confirmed and CISA has not added this to KEV, but the attack class is mature and trivially weaponizable with standard Python tooling such as fickling. Immediate action: avoid calling Trainer.load() with files from untrusted sources, enforce internal-only artifact registries, and scan existing checkpoints with fickling pending an official patch.
What is the risk?
Risk is HIGH despite missing CVSS score. Insecure Pickle deserialization is a well-documented attack class with ready-made tooling, placing the technical barrier to exploit at near-zero for any attacker who can deliver a crafted checkpoint. Exposure is concentrated in data science and ML engineering environments where model sharing is routine and checkpoint provenance is rarely enforced. The absence of active exploitation and the niche footprint of snorkel (versus LangChain or transformers) moderate immediate urgency, but the RCE impact on training infrastructure is maximum severity when triggered.
How does the attack unfold?
What systems are affected?
How severe is it?
What is the attack surface?
What should I do?
6 steps-
Monitor snorkel-team/snorkel releases for a patch; upgrade immediately when available.
-
As an immediate workaround, prohibit calls to Trainer.load() with checkpoint files from any source outside your internal, access-controlled artifact registry.
-
Enforce artifact provenance — all model checkpoints must originate from internal registries (MLflow Model Registry, DVC remote, private S3/GCS with strict ACLs).
-
Scan existing checkpoint files with fickling:
pip install fickling && python -m fickling --check-safety <checkpoint.pkl>— malicious payloads will be flagged. -
In environments where snorkel cannot be removed, wrap torch.load() calls in a subprocess sandbox or use safetensors format for model weights where feasible.
-
Monitor training job logs for anomalous subprocess spawning, unexpected network connections, or file writes during checkpoint loading.
What does CISA's SSVC say?
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2026-31222?
The snorkel weak supervision library (through v0.10.0) allows arbitrary code execution when loading model checkpoint files because its Trainer.load() method calls torch.load() without the weights_only=True safety flag, enabling unrestricted Pickle deserialization of attacker-controlled Python objects. Any ML team that loads snorkel checkpoints from shared repositories, collaborative training environments, or external sources is exposed to full host compromise with no interaction beyond opening a file. No public exploit or active exploitation is confirmed and CISA has not added this to KEV, but the attack class is mature and trivially weaponizable with standard Python tooling such as fickling. Immediate action: avoid calling Trainer.load() with files from untrusted sources, enforce internal-only artifact registries, and scan existing checkpoints with fickling pending an official patch.
Is CVE-2026-31222 actively exploited?
No confirmed active exploitation of CVE-2026-31222 has been reported, but organizations should still patch proactively.
How to fix CVE-2026-31222?
1. Monitor snorkel-team/snorkel releases for a patch; upgrade immediately when available. 2. As an immediate workaround, prohibit calls to Trainer.load() with checkpoint files from any source outside your internal, access-controlled artifact registry. 3. Enforce artifact provenance — all model checkpoints must originate from internal registries (MLflow Model Registry, DVC remote, private S3/GCS with strict ACLs). 4. Scan existing checkpoint files with fickling: `pip install fickling && python -m fickling --check-safety <checkpoint.pkl>` — malicious payloads will be flagged. 5. In environments where snorkel cannot be removed, wrap torch.load() calls in a subprocess sandbox or use safetensors format for model weights where feasible. 6. Monitor training job logs for anomalous subprocess spawning, unexpected network connections, or file writes during checkpoint loading.
What systems are affected by CVE-2026-31222?
This vulnerability affects the following AI/ML architecture patterns: Training pipelines, Weak supervision pipelines, ML experiment environments, Shared GPU clusters, CI/CD ML training jobs.
What is the CVSS score for CVE-2026-31222?
CVE-2026-31222 has a CVSS v3.1 base score of 8.8 (HIGH). The EPSS exploitation probability is 0.39%.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0010.001 AI Software AML.T0011.000 Unsafe AI Artifacts AML.T0018.002 Embed Malware AML.T0112.001 AI Artifacts Compliance Controls Affected
What are the technical details?
Original Advisory
The snorkel library thru v0.10.0 contains an insecure deserialization vulnerability (CWE-502) in the Trainer.load() method of the Trainer class. The method loads model checkpoint files using torch.load() without enabling the security-restrictive weights_only=True parameter. This default behavior allows the deserialization of arbitrary Python objects via the Pickle module. A remote attacker can exploit this by providing a maliciously crafted model file, leading to arbitrary code execution on the victim's system when the file is loaded via the vulnerable method.
Exploitation Scenario
An adversary targeting an ML engineering team publishes a poisoned snorkel checkpoint to a shared model registry or sends it via a collaborative channel (Slack, GitHub PR attachment, email). The victim data scientist calls Trainer.load('checkpoint.pt') to resume training. Snorkel invokes torch.load() on the crafted file, triggering unrestricted Pickle deserialization. A __reduce__ payload in the file spawns a reverse shell or drops a cron-based backdoor running as the data science user. From this foothold the attacker exfiltrates training datasets, AWS/GCP credentials from environment variables, API keys, and pivots to cloud storage buckets attached to the training environment — all without any privileged access being required.
Weaknesses (CWE)
CWE-502 — Deserialization of Untrusted Data: The product deserializes untrusted data without sufficiently ensuring that the resulting data will be valid.
- [Architecture and Design, Implementation] If available, use the signing/sealing features of the programming language to assure that deserialized data has not been tainted. For example, a hash-based message authentication code (HMAC) could be used to ensure that data has not been modified.
- [Implementation] When deserializing data, populate a new object rather than just deserializing. The result is that the data flows through safe input validation and that the functions are safe.
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H References
Timeline
Related Vulnerabilities
CVE-2024-5452 9.8 pytorch-lightning: RCE via deepdiff Delta deserialization
Same package: torch CVE-2023-43654 9.8 TorchServe: SSRF + RCE via unrestricted model URL loading
Same package: torch CVE-2022-45907 9.8 PyTorch: RCE via unsafe eval in JIT annotations
Same package: torch CVE-2022-0845 9.8 pytorch-lightning: code injection enables full RCE
Same package: torch CVE-2024-35198 9.8 TorchServe: URL bypass enables arbitrary model loading
Same package: torch