CVE-2021-43811: Sockeye: unsafe YAML load RCE via model config file

HIGH PoC AVAILABLE
Published December 8, 2021
CISO Take

If your team downloads and runs Sockeye models from external sources, a malicious model config can execute arbitrary code on the engineer's workstation at load time—before any inference occurs. This is a supply chain attack: adversary publishes poisoned model, waits for someone to pull and run it. Upgrade to Sockeye 2.3.24 immediately and enforce model artifact sourcing policies.

Risk Assessment

High severity in practice despite the local attack vector rating. CVSS AV:L understates real-world exposure because ML practitioners routinely pull pre-trained models from public repositories (GitHub, HuggingFace, model zoos) with minimal vetting. Exploitability is trivial—crafting a malicious PyYAML object requires no AI/ML expertise. Automated MLOps pipelines that load model configs without human review are the highest-risk targets, as the payload fires silently during model initialization.

Affected Systems

Package Ecosystem Vulnerable Range Patched
sockeye No patch

Do you use sockeye? You're affected.

Severity & Risk

CVSS 3.1
7.8 / 10
EPSS
8.7%
chance of exploitation in 30 days
Higher than 93% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Local
AC Low
PR None
UI Required
S Unchanged
C High
I High
A High

Recommended Action

6 steps
  1. Upgrade Sockeye to >=2.3.24 immediately.

  2. If patching is blocked: restrict model loading to internally-signed artifacts only—no external model downloads without security review.

  3. Audit all ML codebases for yaml.load() calls and replace with yaml.safe_load() universally.

  4. Implement model artifact signing and integrity verification in MLOps pipelines (cosign, DVC, or similar).

  5. Detection: monitor for unexpected process spawning or outbound network connections triggered during model load operations.

  6. Consider sandboxing model evaluation environments (containers, restricted VMs) to limit blast radius.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 9 - Risk Management System
ISO 42001
A.6.2 - AI system supply chain
NIST AI RMF
MS-2.5 - AI model and data provenance
OWASP LLM Top 10
LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-43811?

If your team downloads and runs Sockeye models from external sources, a malicious model config can execute arbitrary code on the engineer's workstation at load time—before any inference occurs. This is a supply chain attack: adversary publishes poisoned model, waits for someone to pull and run it. Upgrade to Sockeye 2.3.24 immediately and enforce model artifact sourcing policies.

Is CVE-2021-43811 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-43811, increasing the risk of exploitation.

How to fix CVE-2021-43811?

1. Upgrade Sockeye to >=2.3.24 immediately. 2. If patching is blocked: restrict model loading to internally-signed artifacts only—no external model downloads without security review. 3. Audit all ML codebases for yaml.load() calls and replace with yaml.safe_load() universally. 4. Implement model artifact signing and integrity verification in MLOps pipelines (cosign, DVC, or similar). 5. Detection: monitor for unexpected process spawning or outbound network connections triggered during model load operations. 6. Consider sandboxing model evaluation environments (containers, restricted VMs) to limit blast radius.

What systems are affected by CVE-2021-43811?

This vulnerability affects the following AI/ML architecture patterns: NMT training pipelines, model serving, ML model distribution, MLOps CI/CD pipelines, research environments.

What is the CVSS score for CVE-2021-43811?

CVE-2021-43811 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 8.72%.

Technical Details

NVD Description

Sockeye is an open-source sequence-to-sequence framework for Neural Machine Translation built on PyTorch. Sockeye uses YAML to store model and data configurations on disk. Versions below 2.3.24 use unsafe YAML loading, which can be made to execute arbitrary code embedded in config files. An attacker can add malicious code to the config file of a trained model and attempt to convince users to download and run it. If users run the model, the embedded code will run locally. The issue is fixed in version 2.3.24.

Exploitation Scenario

An adversary publishes a Sockeye-compatible pre-trained NMT model (e.g., English-Spanish translation) to a public repository, promoting it via social channels or SEO-optimized documentation. The model's YAML config contains a crafted PyYAML directive (!!python/object/apply:subprocess.check_output or similar) that spawns a reverse shell or exfiltrates cloud credentials (AWS_ACCESS_KEY_ID, GCP service account tokens) upon deserialization. An ML engineer downloads the model to benchmark it against their production system—even just to evaluate quality—and the payload executes with their local privileges, potentially pivoting to cloud infrastructure, training data stores, or the corporate network.

Weaknesses (CWE)

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

Timeline

Published
December 8, 2021
Last Modified
November 21, 2024
First Seen
December 8, 2021

Related Vulnerabilities