CVE-2021-29576: TensorFlow: heap buffer overflow in MaxPool3DGradGrad op

HIGH PoC AVAILABLE
Published May 14, 2021
CISO Take

A heap buffer overflow in TensorFlow's MaxPool3DGradGrad operation can lead to arbitrary code execution by a local low-privileged user. Shared ML training infrastructure and multi-tenant Jupyter/GPU environments carry the highest exposure. Patch to TF 2.5.0 or apply the available backports immediately; enforce sandboxed execution of untrusted TF computation graphs as a compensating control.

What is the risk?

CVSS 7.8 High with local attack vector and low privilege requirement. Real-world risk is concentrated in multi-tenant ML training environments—shared GPU clusters, internal Jupyter hubs, MLOps platforms (Kubeflow, Vertex AI Workbench). The low attack complexity once local access is obtained means a moderately skilled attacker can reliably trigger the overflow. Isolated single-user workstations carry lower urgency but still warrant patching given the C:H/I:H/A:H impact triad.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
7.8 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 11% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Moderate
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C High
I High
A High

What should I do?

5 steps
  1. Upgrade TensorFlow to 2.5.0+, or apply backports: 2.4.2, 2.3.3, 2.2.3, 2.1.4 (patch commit: 63c6a29d0f2d).

  2. Audit all TF versions across training servers, Docker images, and CI/CD pipelines—pin to patched versions.

  3. Restrict execution of untrusted or user-submitted TF computation graphs via containerization and seccomp/AppArmor profiles.

  4. In multi-tenant ML platforms, enforce least-privilege for workload runners; avoid running training jobs as root.

  5. Monitor TF workload processes for anomalous behavior (unexpected child processes, unusual memory access patterns).

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity of high-risk AI systems
ISO 42001
A.6.2 - AI risk management — software dependency vulnerability management
NIST AI RMF
MANAGE 2.2 - Mechanisms for identifying and responding to AI system vulnerabilities
OWASP LLM Top 10
LLM03:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-29576?

A heap buffer overflow in TensorFlow's MaxPool3DGradGrad operation can lead to arbitrary code execution by a local low-privileged user. Shared ML training infrastructure and multi-tenant Jupyter/GPU environments carry the highest exposure. Patch to TF 2.5.0 or apply the available backports immediately; enforce sandboxed execution of untrusted TF computation graphs as a compensating control.

Is CVE-2021-29576 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29576, increasing the risk of exploitation.

How to fix CVE-2021-29576?

1. Upgrade TensorFlow to 2.5.0+, or apply backports: 2.4.2, 2.3.3, 2.2.3, 2.1.4 (patch commit: 63c6a29d0f2d). 2. Audit all TF versions across training servers, Docker images, and CI/CD pipelines—pin to patched versions. 3. Restrict execution of untrusted or user-submitted TF computation graphs via containerization and seccomp/AppArmor profiles. 4. In multi-tenant ML platforms, enforce least-privilege for workload runners; avoid running training jobs as root. 5. Monitor TF workload processes for anomalous behavior (unexpected child processes, unusual memory access patterns).

What systems are affected by CVE-2021-29576?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, shared ML platforms.

What is the CVSS score for CVE-2021-29576?

CVE-2021-29576 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.21%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingshared ML platforms

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: A.6.2
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM03:2025

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. The implementation of `tf.raw_ops.MaxPool3DGradGrad` is vulnerable to a heap buffer overflow. The implementation(https://github.com/tensorflow/tensorflow/blob/596c05a159b6fbb9e39ca10b3f7753b7244fa1e9/tensorflow/core/kernels/pooling_ops_3d.cc#L694-L696) does not check that the initialization of `Pool3dParameters` completes successfully. Since the constructor(https://github.com/tensorflow/tensorflow/blob/596c05a159b6fbb9e39ca10b3f7753b7244fa1e9/tensorflow/core/kernels/pooling_ops_3d.cc#L48-L88) uses `OP_REQUIRES` to validate conditions, the first assertion that fails interrupts the initialization of `params`, making it contain invalid data. In turn, this might cause a heap buffer overflow, depending on default initialized values. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An attacker with shell access on a shared GPU training server (e.g., a compromised data scientist account) crafts a Python script calling tf.raw_ops.MaxPool3DGradGrad with parameters designed to fail Pool3dParameters initialization. The constructor's OP_REQUIRES check aborts initialization, leaving the params struct containing invalid data. When the op proceeds with corrupted params, a heap buffer overflow occurs—giving the attacker the opportunity to overwrite heap metadata and achieve code execution under the TF process owner. In common MLOps environments where training jobs run as privileged service accounts or inside containers with host mounts, this can escalate to full host or cluster compromise.

Weaknesses (CWE)

CWE-787 — Out-of-bounds Write: The product writes data past the end, or before the beginning, of the intended buffer.

  • [Requirements] Use a language that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. For example, many languages that perform their own memory management, such as Java and Perl, are not subject to buffer overflows. Other languages, such as Ada and C#, typically provide overflow protection, but the protection can be disabled by the programmer. Be wary that a language's interface to native code may still be subject to overflows, even if the language itself is theoretically safe.
  • [Architecture and Design] Use a vetted library or framework that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. Examples include the Safe C String Library (SafeStr) by Messier and Viega [REF-57], and the Strsafe.h library from Microsoft [REF-56]. These libraries provide safer versions of overflow-prone string-handling functions.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities