CVE-2021-29577: TensorFlow: heap overflow in AvgPool3DGrad op

HIGH PoC AVAILABLE
Published May 14, 2021
CISO Take

Upgrade TensorFlow to 2.5.0 or apply the backported patches for 2.1.4–2.4.2 immediately. This heap buffer overflow enables local code execution within ML training and serving environments—a real threat on shared GPU clusters, Jupyter hubs, or MLOps platforms where multiple users submit workloads. Audit any multi-tenant ML infrastructure for exposure before assuming low risk.

What is the risk?

CVSS 7.8 (High). The local attack vector with low complexity and low privilege requirements means any authenticated user or compromised process on shared ML infrastructure can trigger this. While not directly remotely exploitable, real-world ML environments—Kubeflow clusters, shared Jupyter servers, TF Serving deployments—frequently expose TensorFlow ops to multiple principals, elevating effective exposure beyond what the local AV suggests. No known active exploitation at time of publication.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
7.8 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 11% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Moderate
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C High
I High
A High

What should I do?

5 steps
  1. Patch: Upgrade to TensorFlow 2.5.0, or cherry-pick commit 6fc9141 onto supported branches (2.1.4, 2.2.3, 2.3.3, 2.4.2).

  2. Workaround: Add input validation to enforce matching first and last dimensions of orig_input_shape and grad before invoking AvgPool3DGrad.

  3. Isolation: Ensure ML training workloads from untrusted users run in isolated containers with dropped capabilities and no host-level privilege.

  4. Detection: Monitor TF worker processes for unexpected crashes or heap corruption signals; review core dumps if available.

  5. Inventory: Identify all TensorFlow versions in use across training, inference, and CI/CD pipelines—containerized and bare-metal.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Art. 15 - Accuracy, robustness and cybersecurity
ISO 42001
8.4 - AI system technical robustness and security
NIST AI RMF
MANAGE 2.2 - Mechanisms to address potentially adverse AI system impacts
OWASP LLM Top 10
LLM03:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-29577?

Upgrade TensorFlow to 2.5.0 or apply the backported patches for 2.1.4–2.4.2 immediately. This heap buffer overflow enables local code execution within ML training and serving environments—a real threat on shared GPU clusters, Jupyter hubs, or MLOps platforms where multiple users submit workloads. Audit any multi-tenant ML infrastructure for exposure before assuming low risk.

Is CVE-2021-29577 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29577, increasing the risk of exploitation.

How to fix CVE-2021-29577?

1. Patch: Upgrade to TensorFlow 2.5.0, or cherry-pick commit 6fc9141 onto supported branches (2.1.4, 2.2.3, 2.3.3, 2.4.2). 2. Workaround: Add input validation to enforce matching first and last dimensions of orig_input_shape and grad before invoking AvgPool3DGrad. 3. Isolation: Ensure ML training workloads from untrusted users run in isolated containers with dropped capabilities and no host-level privilege. 4. Detection: Monitor TF worker processes for unexpected crashes or heap corruption signals; review core dumps if available. 5. Inventory: Identify all TensorFlow versions in use across training, inference, and CI/CD pipelines—containerized and bare-metal.

What systems are affected by CVE-2021-29577?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, MLOps platforms.

What is the CVSS score for CVE-2021-29577?

CVE-2021-29577 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.21%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingMLOps platforms

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0011.001 Malicious Package
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art. 15
ISO 42001: 8.4
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM03:2025

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. The implementation of `tf.raw_ops.AvgPool3DGrad` is vulnerable to a heap buffer overflow. The implementation(https://github.com/tensorflow/tensorflow/blob/d80ffba9702dc19d1fac74fc4b766b3fa1ee976b/tensorflow/core/kernels/pooling_ops_3d.cc#L376-L450) assumes that the `orig_input_shape` and `grad` tensors have similar first and last dimensions but does not check that this assumption is validated. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An attacker with data scientist-level access to a shared Kubeflow cluster submits a crafted training job that calls tf.raw_ops.AvgPool3DGrad with intentionally mismatched tensor shapes—e.g., orig_input_shape with batch size 4 but grad with batch size 16. The missing bounds check causes a heap buffer overflow, corrupting adjacent memory. With a shaped payload, the attacker achieves arbitrary code execution within the TensorFlow process, enabling exfiltration of co-tenants' model checkpoints, training data, or environment secrets, or escaping the container to the host node.

Weaknesses (CWE)

CWE-787 — Out-of-bounds Write: The product writes data past the end, or before the beginning, of the intended buffer.

  • [Requirements] Use a language that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. For example, many languages that perform their own memory management, such as Java and Perl, are not subject to buffer overflows. Other languages, such as Ada and C#, typically provide overflow protection, but the protection can be disabled by the programmer. Be wary that a language's interface to native code may still be subject to overflows, even if the language itself is theoretically safe.
  • [Architecture and Design] Use a vetted library or framework that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. Examples include the Safe C String Library (SafeStr) by Messier and Viega [REF-57], and the Strsafe.h library from Microsoft [REF-56]. These libraries provide safer versions of overflow-prone string-handling functions.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities