CVE-2021-41203: TensorFlow: malformed checkpoint triggers overflow/crash

HIGH PoC AVAILABLE
Published November 5, 2021
CISO Take

Attackers who can modify TensorFlow checkpoint files on disk can crash training or inference processes via integer overflows and undefined behavior. Patch to TF 2.7.0 / 2.6.1 / 2.5.2 / 2.4.4 immediately — any shared storage or model registry accessible to low-privileged users is a viable attack path. Treat checkpoint files as untrusted inputs and enforce integrity checks (checksums, access controls) before loading.

Risk Assessment

CVSS 7.8 High with local attack vector and low complexity/privileges. Risk is elevated in MLOps environments with shared storage (NFS, S3, NAS) where checkpoints are written by one process and loaded by another — a compromised low-privilege account or insider threat can trigger crashes or undefined behavior across the ML stack. Not in CISA KEV and no known active exploitation, but the attack primitive (craft malicious file → crash ML process) is trivial once filesystem access is obtained.

Affected Systems

Package Ecosystem Vulnerable Range Patched
tensorflow pip No patch
195.0K OpenSSF 7.2 3.7K dependents Pushed 6d ago 4% patched ~1372d to patch Full package profile →

Do you use tensorflow? You're affected.

Severity & Risk

CVSS 3.1
7.8 / 10
EPSS
0.0%
chance of exploitation in 30 days
Higher than 5% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Moderate
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C High
I High
A High

Recommended Action

6 steps
  1. Patch: Upgrade to TensorFlow 2.7.0, 2.6.1, 2.5.2, or 2.4.4 immediately.

  2. Restrict filesystem permissions: checkpoint directories should be writable only by the process that creates them; separate write/read service accounts.

  3. Integrity verification: implement SHA-256 checksums on checkpoint files and validate before loading — reject any checkpoint that fails verification.

  4. Immutable storage: use write-once/append-only storage policies for checkpoint artifacts in production.

  5. Detection: monitor for unexpected process crashes (segfaults, OOM) in TF training/inference workloads — repeated crashes against checkpoint-loading paths may indicate active exploitation.

  6. Audit: inventory all systems running unpatched TF versions, prioritize those with shared checkpoint storage.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity Article 9 - Risk management system
ISO 42001
8.4 - AI system lifecycle — data and model integrity 9.1 - Monitoring, measurement, analysis and evaluation
NIST AI RMF
GOVERN 1.1 - Policies, processes, and procedures for AI risk management MANAGE 2.2 - Mechanisms to sustain the value of deployed AI systems

Frequently Asked Questions

What is CVE-2021-41203?

Attackers who can modify TensorFlow checkpoint files on disk can crash training or inference processes via integer overflows and undefined behavior. Patch to TF 2.7.0 / 2.6.1 / 2.5.2 / 2.4.4 immediately — any shared storage or model registry accessible to low-privileged users is a viable attack path. Treat checkpoint files as untrusted inputs and enforce integrity checks (checksums, access controls) before loading.

Is CVE-2021-41203 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-41203, increasing the risk of exploitation.

How to fix CVE-2021-41203?

1. Patch: Upgrade to TensorFlow 2.7.0, 2.6.1, 2.5.2, or 2.4.4 immediately. 2. Restrict filesystem permissions: checkpoint directories should be writable only by the process that creates them; separate write/read service accounts. 3. Integrity verification: implement SHA-256 checksums on checkpoint files and validate before loading — reject any checkpoint that fails verification. 4. Immutable storage: use write-once/append-only storage policies for checkpoint artifacts in production. 5. Detection: monitor for unexpected process crashes (segfaults, OOM) in TF training/inference workloads — repeated crashes against checkpoint-loading paths may indicate active exploitation. 6. Audit: inventory all systems running unpatched TF versions, prioritize those with shared checkpoint storage.

What systems are affected by CVE-2021-41203?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, MLOps CI/CD pipelines, transfer learning workflows, distributed training infrastructure.

What is the CVSS score for CVE-2021-41203?

CVE-2021-41203 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.02%.

Technical Details

NVD Description

TensorFlow is an open source platform for machine learning. In affected versions an attacker can trigger undefined behavior, integer overflows, segfaults and `CHECK`-fail crashes if they can change saved checkpoints from outside of TensorFlow. This is because the checkpoints loading infrastructure is missing validation for invalid file formats. The fixes will be included in TensorFlow 2.7.0. We will also cherrypick these commits on TensorFlow 2.6.1, TensorFlow 2.5.2, and TensorFlow 2.4.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with low-privilege access to a shared MLOps environment (e.g., compromised data scientist account, malicious insider, or supply chain compromise of a model registry) locates the checkpoint storage directory for a production training or fine-tuning job. They craft a malformed checkpoint file — manipulating file format fields to trigger integer overflow conditions — and replace or inject it into the expected checkpoint path. When the TensorFlow training process resumes from checkpoint (e.g., nightly scheduled training job), it loads the malicious file without validation, triggering undefined behavior, segfaults, or CHECK-fail crashes. In a Kubernetes-based ML training cluster, this could repeatedly crash pods and disrupt model delivery pipelines, or in worst-case exploit the undefined behavior for code execution under the training process's service account.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
November 5, 2021
Last Modified
November 21, 2024
First Seen
November 5, 2021

Related Vulnerabilities