CVE-2021-43811: Sockeye: unsafe YAML load RCE via model config file

HIGH PoC AVAILABLE
Published December 8, 2021
CISO Take

If your team downloads and runs Sockeye models from external sources, a malicious model config can execute arbitrary code on the engineer's workstation at load time—before any inference occurs. This is a supply chain attack: adversary publishes poisoned model, waits for someone to pull and run it. Upgrade to Sockeye 2.3.24 immediately and enforce model artifact sourcing policies.

What is the risk?

High severity in practice despite the local attack vector rating. CVSS AV:L understates real-world exposure because ML practitioners routinely pull pre-trained models from public repositories (GitHub, HuggingFace, model zoos) with minimal vetting. Exploitability is trivial—crafting a malicious PyYAML object requires no AI/ML expertise. Automated MLOps pipelines that load model configs without human review are the highest-risk targets, as the payload fires silently during model initialization.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
sockeye No patch

Do you use sockeye? You're affected.

How severe is it?

CVSS 3.1
7.8 / 10
EPSS
2.4%
chance of exploitation in 30 days
Higher than 82% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR None
UI Required
S Unchanged
C High
I High
A High

What should I do?

6 steps
  1. Upgrade Sockeye to >=2.3.24 immediately.

  2. If patching is blocked: restrict model loading to internally-signed artifacts only—no external model downloads without security review.

  3. Audit all ML codebases for yaml.load() calls and replace with yaml.safe_load() universally.

  4. Implement model artifact signing and integrity verification in MLOps pipelines (cosign, DVC, or similar).

  5. Detection: monitor for unexpected process spawning or outbound network connections triggered during model load operations.

  6. Consider sandboxing model evaluation environments (containers, restricted VMs) to limit blast radius.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 9 - Risk Management System
ISO 42001
A.6.2 - AI system supply chain
NIST AI RMF
MS-2.5 - AI model and data provenance
OWASP LLM Top 10
LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-43811?

If your team downloads and runs Sockeye models from external sources, a malicious model config can execute arbitrary code on the engineer's workstation at load time—before any inference occurs. This is a supply chain attack: adversary publishes poisoned model, waits for someone to pull and run it. Upgrade to Sockeye 2.3.24 immediately and enforce model artifact sourcing policies.

Is CVE-2021-43811 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-43811, increasing the risk of exploitation.

How to fix CVE-2021-43811?

1. Upgrade Sockeye to >=2.3.24 immediately. 2. If patching is blocked: restrict model loading to internally-signed artifacts only—no external model downloads without security review. 3. Audit all ML codebases for yaml.load() calls and replace with yaml.safe_load() universally. 4. Implement model artifact signing and integrity verification in MLOps pipelines (cosign, DVC, or similar). 5. Detection: monitor for unexpected process spawning or outbound network connections triggered during model load operations. 6. Consider sandboxing model evaluation environments (containers, restricted VMs) to limit blast radius.

What systems are affected by CVE-2021-43811?

This vulnerability affects the following AI/ML architecture patterns: NMT training pipelines, model serving, ML model distribution, MLOps CI/CD pipelines, research environments.

What is the CVSS score for CVE-2021-43811?

CVE-2021-43811 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 2.42%.

What is the AI security impact?

Affected AI Architectures

NMT training pipelinesmodel servingML model distributionMLOps CI/CD pipelinesresearch environments

MITRE ATLAS Techniques

AML.T0010.003 Model
AML.T0011.000 Unsafe AI Artifacts
AML.T0018.002 Embed Malware
AML.T0058 Publish Poisoned Models

Compliance Controls Affected

EU AI Act: Article 9
ISO 42001: A.6.2
NIST AI RMF: MS-2.5
OWASP LLM Top 10: LLM05

What are the technical details?

Original Advisory

Sockeye is an open-source sequence-to-sequence framework for Neural Machine Translation built on PyTorch. Sockeye uses YAML to store model and data configurations on disk. Versions below 2.3.24 use unsafe YAML loading, which can be made to execute arbitrary code embedded in config files. An attacker can add malicious code to the config file of a trained model and attempt to convince users to download and run it. If users run the model, the embedded code will run locally. The issue is fixed in version 2.3.24.

Exploitation Scenario

An adversary publishes a Sockeye-compatible pre-trained NMT model (e.g., English-Spanish translation) to a public repository, promoting it via social channels or SEO-optimized documentation. The model's YAML config contains a crafted PyYAML directive (!!python/object/apply:subprocess.check_output or similar) that spawns a reverse shell or exfiltrates cloud credentials (AWS_ACCESS_KEY_ID, GCP service account tokens) upon deserialization. An ML engineer downloads the model to benchmark it against their production system—even just to evaluate quality—and the payload executes with their local privileges, potentially pivoting to cloud infrastructure, training data stores, or the corporate network.

Weaknesses (CWE)

CWE-94 — Improper Control of Generation of Code ('Code Injection'): The product constructs all or part of a code segment using externally-influenced input from an upstream component, but it does not neutralize or incorrectly neutralizes special elements that could modify the syntax or behavior of the intended code segment.

  • [Architecture and Design] Refactor your program so that you do not have to dynamically generate code.
  • [Architecture and Design] Run your code in a "jail" or similar sandbox environment that enforces strict boundaries between the process and the operating system. This may effectively restrict which code can be executed by your product. Examples include the Unix chroot jail and AppArmor. In general, managed code may provide some protection. This may not be a feasible solution, and it only limits the impact to the operating system; the rest of your application may still be subject to compromise. Be careful to avoid CWE-243 and other weaknesses related to jails.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

Timeline

Published
December 8, 2021
Last Modified
November 21, 2024
First Seen
December 8, 2021

Related Vulnerabilities