CVE-2021-29577: TensorFlow heap overflow

CISO Take

Upgrade TensorFlow to 2.5.0 or apply the backported patches for 2.1.4–2.4.2 immediately. This heap buffer overflow enables local code execution within ML training and serving environments—a real threat on shared GPU clusters, Jupyter hubs, or MLOps platforms where multiple users submit workloads. Audit any multi-tenant ML infrastructure for exposure before assuming low risk.

What is the risk?

CVSS 7.8 (High). The local attack vector with low complexity and low privilege requirements means any authenticated user or compromised process on shared ML infrastructure can trigger this. While not directly remotely exploitable, real-world ML environments—Kubeflow clusters, shared Jupyter servers, TF Serving deployments—frequently expose TensorFlow ops to multiple principals, elevating effective exposure beyond what the local AV suggests. No known active exploitation at time of publication.

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
TensorFlow	pip	—	No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1

7.8 / 10

EPSS

0.2%

chance of exploitation in 30 days

Higher than 11% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Moderate

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV Local

AC Low

PR Low

UI None

S Unchanged

C High

I High

A High

What should I do?

5 steps

Patch: Upgrade to TensorFlow 2.5.0, or cherry-pick commit 6fc9141 onto supported branches (2.1.4, 2.2.3, 2.3.3, 2.4.2).
Workaround: Add input validation to enforce matching first and last dimensions of orig_input_shape and grad before invoking AvgPool3DGrad.
Isolation: Ensure ML training workloads from untrusted users run in isolated containers with dropped capabilities and no host-level privilege.
Detection: Monitor TF worker processes for unexpected crashes or heap corruption signals; review core dumps if available.
Inventory: Identify all TensorFlow versions in use across training, inference, and CI/CD pipelines—containerized and bare-metal.

How is it classified?

Code Execution Framework Inference AML.T0010.001 - AI Software AML.T0011.001 - Malicious Package AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Art. 15 - Accuracy, robustness and cybersecurity

ISO 42001

8.4 - AI system technical robustness and security

NIST AI RMF

MANAGE 2.2 - Mechanisms to address potentially adverse AI system impacts

OWASP LLM Top 10

LLM03:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-29577?

Upgrade TensorFlow to 2.5.0 or apply the backported patches for 2.1.4–2.4.2 immediately. This heap buffer overflow enables local code execution within ML training and serving environments—a real threat on shared GPU clusters, Jupyter hubs, or MLOps platforms where multiple users submit workloads. Audit any multi-tenant ML infrastructure for exposure before assuming low risk.

Is CVE-2021-29577 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29577, increasing the risk of exploitation.

How to fix CVE-2021-29577?

1. Patch: Upgrade to TensorFlow 2.5.0, or cherry-pick commit 6fc9141 onto supported branches (2.1.4, 2.2.3, 2.3.3, 2.4.2). 2. Workaround: Add input validation to enforce matching first and last dimensions of orig_input_shape and grad before invoking AvgPool3DGrad. 3. Isolation: Ensure ML training workloads from untrusted users run in isolated containers with dropped capabilities and no host-level privilege. 4. Detection: Monitor TF worker processes for unexpected crashes or heap corruption signals; review core dumps if available. 5. Inventory: Identify all TensorFlow versions in use across training, inference, and CI/CD pipelines—containerized and bare-metal.

What systems are affected by CVE-2021-29577?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, MLOps platforms.

What is the CVSS score for CVE-2021-29577?

CVE-2021-29577 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.21%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingMLOps platforms

MITRE ATLAS Techniques

AML.T0010.001 AI Software

AML.T0011.001 Malicious Package

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art. 15

ISO 42001: 8.4

NIST AI RMF: MANAGE 2.2

OWASP LLM Top 10: LLM03:2025

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. The implementation of `tf.raw_ops.AvgPool3DGrad` is vulnerable to a heap buffer overflow. The implementation(https://github.com/tensorflow/tensorflow/blob/d80ffba9702dc19d1fac74fc4b766b3fa1ee976b/tensorflow/core/kernels/pooling_ops_3d.cc#L376-L450) assumes that the `orig_input_shape` and `grad` tensors have similar first and last dimensions but does not check that this assumption is validated. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An attacker with data scientist-level access to a shared Kubeflow cluster submits a crafted training job that calls tf.raw_ops.AvgPool3DGrad with intentionally mismatched tensor shapes—e.g., orig_input_shape with batch size 4 but grad with batch size 16. The missing bounds check causes a heap buffer overflow, corrupting adjacent memory. With a shaped payload, the attacker achieves arbitrary code execution within the TensorFlow process, enabling exfiltration of co-tenants' model checkpoints, training data, or environment secrets, or escaping the container to the host node.

Weaknesses (CWE)

CWE-787 Out-of-bounds Write Primary CWE-119 Improper Restriction of Operations within the Bounds of a Memory Buffer

CWE-787 — Out-of-bounds Write: The product writes data past the end, or before the beginning, of the intended buffer.

[Requirements] Use a language that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. For example, many languages that perform their own memory management, such as Java and Perl, are not subject to buffer overflows. Other languages, such as Ada and C#, typically provide overflow protection, but the protection can be disabled by the programmer. Be wary that a language's interface to native code may still be subject to overflows, even if the language itself is theoretically safe.
[Architecture and Design] Use a vetted library or framework that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. Examples include the Safe C String Library (SafeStr) by Messier and Viega [REF-57], and the Strsafe.h library from Microsoft [REF-56]. These libraries provide safer versions of overflow-prone string-handling functions.

Source: MITRE CWE corpus.