CVE-2021-29577: TensorFlow: heap overflow in AvgPool3DGrad op
HIGH PoC AVAILABLEUpgrade TensorFlow to 2.5.0 or apply the backported patches for 2.1.4–2.4.2 immediately. This heap buffer overflow enables local code execution within ML training and serving environments—a real threat on shared GPU clusters, Jupyter hubs, or MLOps platforms where multiple users submit workloads. Audit any multi-tenant ML infrastructure for exposure before assuming low risk.
What is the risk?
CVSS 7.8 (High). The local attack vector with low complexity and low privilege requirements means any authenticated user or compromised process on shared ML infrastructure can trigger this. While not directly remotely exploitable, real-world ML environments—Kubeflow clusters, shared Jupyter servers, TF Serving deployments—frequently expose TensorFlow ops to multiple principals, elevating effective exposure beyond what the local AV suggests. No known active exploitation at time of publication.
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| TensorFlow | pip | — | No patch |
Do you use TensorFlow? You're affected.
How severe is it?
What is the attack surface?
What should I do?
5 steps-
Patch: Upgrade to TensorFlow 2.5.0, or cherry-pick commit 6fc9141 onto supported branches (2.1.4, 2.2.3, 2.3.3, 2.4.2).
-
Workaround: Add input validation to enforce matching first and last dimensions of orig_input_shape and grad before invoking AvgPool3DGrad.
-
Isolation: Ensure ML training workloads from untrusted users run in isolated containers with dropped capabilities and no host-level privilege.
-
Detection: Monitor TF worker processes for unexpected crashes or heap corruption signals; review core dumps if available.
-
Inventory: Identify all TensorFlow versions in use across training, inference, and CI/CD pipelines—containerized and bare-metal.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2021-29577?
Upgrade TensorFlow to 2.5.0 or apply the backported patches for 2.1.4–2.4.2 immediately. This heap buffer overflow enables local code execution within ML training and serving environments—a real threat on shared GPU clusters, Jupyter hubs, or MLOps platforms where multiple users submit workloads. Audit any multi-tenant ML infrastructure for exposure before assuming low risk.
Is CVE-2021-29577 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2021-29577, increasing the risk of exploitation.
How to fix CVE-2021-29577?
1. Patch: Upgrade to TensorFlow 2.5.0, or cherry-pick commit 6fc9141 onto supported branches (2.1.4, 2.2.3, 2.3.3, 2.4.2). 2. Workaround: Add input validation to enforce matching first and last dimensions of orig_input_shape and grad before invoking AvgPool3DGrad. 3. Isolation: Ensure ML training workloads from untrusted users run in isolated containers with dropped capabilities and no host-level privilege. 4. Detection: Monitor TF worker processes for unexpected crashes or heap corruption signals; review core dumps if available. 5. Inventory: Identify all TensorFlow versions in use across training, inference, and CI/CD pipelines—containerized and bare-metal.
What systems are affected by CVE-2021-29577?
This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, MLOps platforms.
What is the CVSS score for CVE-2021-29577?
CVE-2021-29577 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.21%.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0010.001 AI Software AML.T0011.001 Malicious Package AML.T0049 Exploit Public-Facing Application Compliance Controls Affected
What are the technical details?
Original Advisory
TensorFlow is an end-to-end open source platform for machine learning. The implementation of `tf.raw_ops.AvgPool3DGrad` is vulnerable to a heap buffer overflow. The implementation(https://github.com/tensorflow/tensorflow/blob/d80ffba9702dc19d1fac74fc4b766b3fa1ee976b/tensorflow/core/kernels/pooling_ops_3d.cc#L376-L450) assumes that the `orig_input_shape` and `grad` tensors have similar first and last dimensions but does not check that this assumption is validated. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.
Exploitation Scenario
An attacker with data scientist-level access to a shared Kubeflow cluster submits a crafted training job that calls tf.raw_ops.AvgPool3DGrad with intentionally mismatched tensor shapes—e.g., orig_input_shape with batch size 4 but grad with batch size 16. The missing bounds check causes a heap buffer overflow, corrupting adjacent memory. With a shaped payload, the attacker achieves arbitrary code execution within the TensorFlow process, enabling exfiltration of co-tenants' model checkpoints, training data, or environment secrets, or escaping the container to the host node.
Weaknesses (CWE)
CWE-787 Out-of-bounds Write
Primary
CWE-119 Improper Restriction of Operations within the Bounds of a Memory Buffer CWE-787 — Out-of-bounds Write: The product writes data past the end, or before the beginning, of the intended buffer.
- [Requirements] Use a language that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. For example, many languages that perform their own memory management, such as Java and Perl, are not subject to buffer overflows. Other languages, such as Ada and C#, typically provide overflow protection, but the protection can be disabled by the programmer. Be wary that a language's interface to native code may still be subject to overflows, even if the language itself is theoretically safe.
- [Architecture and Design] Use a vetted library or framework that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. Examples include the Safe C String Library (SafeStr) by Messier and Viega [REF-57], and the Strsafe.h library from Microsoft [REF-56]. These libraries provide safer versions of overflow-prone string-handling functions.
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H References
- github.com/tensorflow/tensorflow/commit/6fc9141f42f6a72180ecd24021c3e6b36165fe0d Patch 3rd Party
- github.com/tensorflow/tensorflow/security/advisories/GHSA-v6r6-84gr-92rm Exploit Patch 3rd Party
Timeline
Related Vulnerabilities
CVE-2020-15196 9.9 TensorFlow: heap OOB read in sparse/ragged count ops
Same package: tensorflow CVE-2020-15205 9.8 TensorFlow: heap overflow in StringNGrams, ASLR bypass
Same package: tensorflow CVE-2020-15208 9.8 TFLite: OOB read/write via tensor dimension mismatch
Same package: tensorflow CVE-2019-16778 9.8 TensorFlow: heap overflow in UnsortedSegmentSum op
Same package: tensorflow CVE-2022-23587 9.8 TensorFlow: integer overflow in Grappler enables RCE
Same package: tensorflow