CVE-2021-29576: TensorFlow heap buffer overflow

CISO Take

A heap buffer overflow in TensorFlow's MaxPool3DGradGrad operation can lead to arbitrary code execution by a local low-privileged user. Shared ML training infrastructure and multi-tenant Jupyter/GPU environments carry the highest exposure. Patch to TF 2.5.0 or apply the available backports immediately; enforce sandboxed execution of untrusted TF computation graphs as a compensating control.

What is the risk?

CVSS 7.8 High with local attack vector and low privilege requirement. Real-world risk is concentrated in multi-tenant ML training environments—shared GPU clusters, internal Jupyter hubs, MLOps platforms (Kubeflow, Vertex AI Workbench). The low attack complexity once local access is obtained means a moderately skilled attacker can reliably trigger the overflow. Isolated single-user workstations carry lower urgency but still warrant patching given the C:H/I:H/A:H impact triad.

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
TensorFlow	pip	—	No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1

7.8 / 10

EPSS

0.2%

chance of exploitation in 30 days

Higher than 11% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Moderate

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV Local

AC Low

PR Low

UI None

S Unchanged

C High

I High

A High

What should I do?

5 steps

Upgrade TensorFlow to 2.5.0+, or apply backports: 2.4.2, 2.3.3, 2.2.3, 2.1.4 (patch commit: 63c6a29d0f2d).
Audit all TF versions across training servers, Docker images, and CI/CD pipelines—pin to patched versions.
Restrict execution of untrusted or user-submitted TF computation graphs via containerization and seccomp/AppArmor profiles.
In multi-tenant ML platforms, enforce least-privilege for workload runners; avoid running training jobs as root.
Monitor TF workload processes for anomalous behavior (unexpected child processes, unusual memory access patterns).

How is it classified?

Code Execution Framework AML.T0010.001 - AI Software AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity of high-risk AI systems

ISO 42001

A.6.2 - AI risk management — software dependency vulnerability management

NIST AI RMF

MANAGE 2.2 - Mechanisms for identifying and responding to AI system vulnerabilities

OWASP LLM Top 10

LLM03:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-29576?

A heap buffer overflow in TensorFlow's MaxPool3DGradGrad operation can lead to arbitrary code execution by a local low-privileged user. Shared ML training infrastructure and multi-tenant Jupyter/GPU environments carry the highest exposure. Patch to TF 2.5.0 or apply the available backports immediately; enforce sandboxed execution of untrusted TF computation graphs as a compensating control.

Is CVE-2021-29576 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29576, increasing the risk of exploitation.

How to fix CVE-2021-29576?

1. Upgrade TensorFlow to 2.5.0+, or apply backports: 2.4.2, 2.3.3, 2.2.3, 2.1.4 (patch commit: 63c6a29d0f2d). 2. Audit all TF versions across training servers, Docker images, and CI/CD pipelines—pin to patched versions. 3. Restrict execution of untrusted or user-submitted TF computation graphs via containerization and seccomp/AppArmor profiles. 4. In multi-tenant ML platforms, enforce least-privilege for workload runners; avoid running training jobs as root. 5. Monitor TF workload processes for anomalous behavior (unexpected child processes, unusual memory access patterns).

What systems are affected by CVE-2021-29576?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, shared ML platforms.

What is the CVSS score for CVE-2021-29576?

CVE-2021-29576 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.21%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingshared ML platforms

MITRE ATLAS Techniques

AML.T0010.001 AI Software

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15

ISO 42001: A.6.2

NIST AI RMF: MANAGE 2.2

OWASP LLM Top 10: LLM03:2025

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. The implementation of `tf.raw_ops.MaxPool3DGradGrad` is vulnerable to a heap buffer overflow. The implementation(https://github.com/tensorflow/tensorflow/blob/596c05a159b6fbb9e39ca10b3f7753b7244fa1e9/tensorflow/core/kernels/pooling_ops_3d.cc#L694-L696) does not check that the initialization of `Pool3dParameters` completes successfully. Since the constructor(https://github.com/tensorflow/tensorflow/blob/596c05a159b6fbb9e39ca10b3f7753b7244fa1e9/tensorflow/core/kernels/pooling_ops_3d.cc#L48-L88) uses `OP_REQUIRES` to validate conditions, the first assertion that fails interrupts the initialization of `params`, making it contain invalid data. In turn, this might cause a heap buffer overflow, depending on default initialized values. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An attacker with shell access on a shared GPU training server (e.g., a compromised data scientist account) crafts a Python script calling tf.raw_ops.MaxPool3DGradGrad with parameters designed to fail Pool3dParameters initialization. The constructor's OP_REQUIRES check aborts initialization, leaving the params struct containing invalid data. When the op proceeds with corrupted params, a heap buffer overflow occurs—giving the attacker the opportunity to overwrite heap metadata and achieve code execution under the TF process owner. In common MLOps environments where training jobs run as privileged service accounts or inside containers with host mounts, this can escalate to full host or cluster compromise.

Weaknesses (CWE)

CWE-787 Out-of-bounds Write Primary CWE-119 Improper Restriction of Operations within the Bounds of a Memory Buffer

CWE-787 — Out-of-bounds Write: The product writes data past the end, or before the beginning, of the intended buffer.

[Requirements] Use a language that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. For example, many languages that perform their own memory management, such as Java and Perl, are not subject to buffer overflows. Other languages, such as Ada and C#, typically provide overflow protection, but the protection can be disabled by the programmer. Be wary that a language's interface to native code may still be subject to overflows, even if the language itself is theoretically safe.
[Architecture and Design] Use a vetted library or framework that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. Examples include the Safe C String Library (SafeStr) by Messier and Viega [REF-57], and the Strsafe.h library from Microsoft [REF-56]. These libraries provide safer versions of overflow-prone string-handling functions.

Source: MITRE CWE corpus.