CVE-2021-29577: TensorFlow: heap overflow in AvgPool3DGrad op

HIGH PoC AVAILABLE
Published May 14, 2021
CISO Take

Upgrade TensorFlow to 2.5.0 or apply the backported patches for 2.1.4–2.4.2 immediately. This heap buffer overflow enables local code execution within ML training and serving environments—a real threat on shared GPU clusters, Jupyter hubs, or MLOps platforms where multiple users submit workloads. Audit any multi-tenant ML infrastructure for exposure before assuming low risk.

Risk Assessment

CVSS 7.8 (High). The local attack vector with low complexity and low privilege requirements means any authenticated user or compromised process on shared ML infrastructure can trigger this. While not directly remotely exploitable, real-world ML environments—Kubeflow clusters, shared Jupyter servers, TF Serving deployments—frequently expose TensorFlow ops to multiple principals, elevating effective exposure beyond what the local AV suggests. No known active exploitation at time of publication.

Affected Systems

Package Ecosystem Vulnerable Range Patched
tensorflow pip No patch
195.0K OpenSSF 7.2 3.7K dependents Pushed today 4% patched ~1372d to patch Full package profile →

Do you use tensorflow? You're affected.

Severity & Risk

CVSS 3.1
7.8 / 10
EPSS
0.0%
chance of exploitation in 30 days
Higher than 2% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Moderate
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C High
I High
A High

Recommended Action

5 steps
  1. Patch: Upgrade to TensorFlow 2.5.0, or cherry-pick commit 6fc9141 onto supported branches (2.1.4, 2.2.3, 2.3.3, 2.4.2).

  2. Workaround: Add input validation to enforce matching first and last dimensions of orig_input_shape and grad before invoking AvgPool3DGrad.

  3. Isolation: Ensure ML training workloads from untrusted users run in isolated containers with dropped capabilities and no host-level privilege.

  4. Detection: Monitor TF worker processes for unexpected crashes or heap corruption signals; review core dumps if available.

  5. Inventory: Identify all TensorFlow versions in use across training, inference, and CI/CD pipelines—containerized and bare-metal.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Art. 15 - Accuracy, robustness and cybersecurity
ISO 42001
8.4 - AI system technical robustness and security
NIST AI RMF
MANAGE 2.2 - Mechanisms to address potentially adverse AI system impacts
OWASP LLM Top 10
LLM03:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-29577?

Upgrade TensorFlow to 2.5.0 or apply the backported patches for 2.1.4–2.4.2 immediately. This heap buffer overflow enables local code execution within ML training and serving environments—a real threat on shared GPU clusters, Jupyter hubs, or MLOps platforms where multiple users submit workloads. Audit any multi-tenant ML infrastructure for exposure before assuming low risk.

Is CVE-2021-29577 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29577, increasing the risk of exploitation.

How to fix CVE-2021-29577?

1. Patch: Upgrade to TensorFlow 2.5.0, or cherry-pick commit 6fc9141 onto supported branches (2.1.4, 2.2.3, 2.3.3, 2.4.2). 2. Workaround: Add input validation to enforce matching first and last dimensions of orig_input_shape and grad before invoking AvgPool3DGrad. 3. Isolation: Ensure ML training workloads from untrusted users run in isolated containers with dropped capabilities and no host-level privilege. 4. Detection: Monitor TF worker processes for unexpected crashes or heap corruption signals; review core dumps if available. 5. Inventory: Identify all TensorFlow versions in use across training, inference, and CI/CD pipelines—containerized and bare-metal.

What systems are affected by CVE-2021-29577?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, MLOps platforms.

What is the CVSS score for CVE-2021-29577?

CVE-2021-29577 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.01%.

Technical Details

NVD Description

TensorFlow is an end-to-end open source platform for machine learning. The implementation of `tf.raw_ops.AvgPool3DGrad` is vulnerable to a heap buffer overflow. The implementation(https://github.com/tensorflow/tensorflow/blob/d80ffba9702dc19d1fac74fc4b766b3fa1ee976b/tensorflow/core/kernels/pooling_ops_3d.cc#L376-L450) assumes that the `orig_input_shape` and `grad` tensors have similar first and last dimensions but does not check that this assumption is validated. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An attacker with data scientist-level access to a shared Kubeflow cluster submits a crafted training job that calls tf.raw_ops.AvgPool3DGrad with intentionally mismatched tensor shapes—e.g., orig_input_shape with batch size 4 but grad with batch size 16. The missing bounds check causes a heap buffer overflow, corrupting adjacent memory. With a shaped payload, the attacker achieves arbitrary code execution within the TensorFlow process, enabling exfiltration of co-tenants' model checkpoints, training data, or environment secrets, or escaping the container to the host node.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities