CVE-2021-29576: TensorFlow: heap buffer overflow in MaxPool3DGradGrad op
HIGH PoC AVAILABLEA heap buffer overflow in TensorFlow's MaxPool3DGradGrad operation can lead to arbitrary code execution by a local low-privileged user. Shared ML training infrastructure and multi-tenant Jupyter/GPU environments carry the highest exposure. Patch to TF 2.5.0 or apply the available backports immediately; enforce sandboxed execution of untrusted TF computation graphs as a compensating control.
What is the risk?
CVSS 7.8 High with local attack vector and low privilege requirement. Real-world risk is concentrated in multi-tenant ML training environments—shared GPU clusters, internal Jupyter hubs, MLOps platforms (Kubeflow, Vertex AI Workbench). The low attack complexity once local access is obtained means a moderately skilled attacker can reliably trigger the overflow. Isolated single-user workstations carry lower urgency but still warrant patching given the C:H/I:H/A:H impact triad.
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| TensorFlow | pip | — | No patch |
Do you use TensorFlow? You're affected.
How severe is it?
What is the attack surface?
What should I do?
5 steps-
Upgrade TensorFlow to 2.5.0+, or apply backports: 2.4.2, 2.3.3, 2.2.3, 2.1.4 (patch commit: 63c6a29d0f2d).
-
Audit all TF versions across training servers, Docker images, and CI/CD pipelines—pin to patched versions.
-
Restrict execution of untrusted or user-submitted TF computation graphs via containerization and seccomp/AppArmor profiles.
-
In multi-tenant ML platforms, enforce least-privilege for workload runners; avoid running training jobs as root.
-
Monitor TF workload processes for anomalous behavior (unexpected child processes, unusual memory access patterns).
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2021-29576?
A heap buffer overflow in TensorFlow's MaxPool3DGradGrad operation can lead to arbitrary code execution by a local low-privileged user. Shared ML training infrastructure and multi-tenant Jupyter/GPU environments carry the highest exposure. Patch to TF 2.5.0 or apply the available backports immediately; enforce sandboxed execution of untrusted TF computation graphs as a compensating control.
Is CVE-2021-29576 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2021-29576, increasing the risk of exploitation.
How to fix CVE-2021-29576?
1. Upgrade TensorFlow to 2.5.0+, or apply backports: 2.4.2, 2.3.3, 2.2.3, 2.1.4 (patch commit: 63c6a29d0f2d). 2. Audit all TF versions across training servers, Docker images, and CI/CD pipelines—pin to patched versions. 3. Restrict execution of untrusted or user-submitted TF computation graphs via containerization and seccomp/AppArmor profiles. 4. In multi-tenant ML platforms, enforce least-privilege for workload runners; avoid running training jobs as root. 5. Monitor TF workload processes for anomalous behavior (unexpected child processes, unusual memory access patterns).
What systems are affected by CVE-2021-29576?
This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, shared ML platforms.
What is the CVSS score for CVE-2021-29576?
CVE-2021-29576 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.21%.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0010.001 AI Software AML.T0049 Exploit Public-Facing Application Compliance Controls Affected
What are the technical details?
Original Advisory
TensorFlow is an end-to-end open source platform for machine learning. The implementation of `tf.raw_ops.MaxPool3DGradGrad` is vulnerable to a heap buffer overflow. The implementation(https://github.com/tensorflow/tensorflow/blob/596c05a159b6fbb9e39ca10b3f7753b7244fa1e9/tensorflow/core/kernels/pooling_ops_3d.cc#L694-L696) does not check that the initialization of `Pool3dParameters` completes successfully. Since the constructor(https://github.com/tensorflow/tensorflow/blob/596c05a159b6fbb9e39ca10b3f7753b7244fa1e9/tensorflow/core/kernels/pooling_ops_3d.cc#L48-L88) uses `OP_REQUIRES` to validate conditions, the first assertion that fails interrupts the initialization of `params`, making it contain invalid data. In turn, this might cause a heap buffer overflow, depending on default initialized values. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.
Exploitation Scenario
An attacker with shell access on a shared GPU training server (e.g., a compromised data scientist account) crafts a Python script calling tf.raw_ops.MaxPool3DGradGrad with parameters designed to fail Pool3dParameters initialization. The constructor's OP_REQUIRES check aborts initialization, leaving the params struct containing invalid data. When the op proceeds with corrupted params, a heap buffer overflow occurs—giving the attacker the opportunity to overwrite heap metadata and achieve code execution under the TF process owner. In common MLOps environments where training jobs run as privileged service accounts or inside containers with host mounts, this can escalate to full host or cluster compromise.
Weaknesses (CWE)
CWE-787 Out-of-bounds Write
Primary
CWE-119 Improper Restriction of Operations within the Bounds of a Memory Buffer CWE-787 — Out-of-bounds Write: The product writes data past the end, or before the beginning, of the intended buffer.
- [Requirements] Use a language that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. For example, many languages that perform their own memory management, such as Java and Perl, are not subject to buffer overflows. Other languages, such as Ada and C#, typically provide overflow protection, but the protection can be disabled by the programmer. Be wary that a language's interface to native code may still be subject to overflows, even if the language itself is theoretically safe.
- [Architecture and Design] Use a vetted library or framework that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. Examples include the Safe C String Library (SafeStr) by Messier and Viega [REF-57], and the Strsafe.h library from Microsoft [REF-56]. These libraries provide safer versions of overflow-prone string-handling functions.
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H References
- github.com/tensorflow/tensorflow/commit/63c6a29d0f2d692b247f7bf81f8732d6442fad09 Patch 3rd Party
- github.com/tensorflow/tensorflow/security/advisories/GHSA-7cqx-92hp-x6wh Exploit Patch 3rd Party
Timeline
Related Vulnerabilities
CVE-2020-15196 9.9 TensorFlow: heap OOB read in sparse/ragged count ops
Same package: tensorflow CVE-2020-15205 9.8 TensorFlow: heap overflow in StringNGrams, ASLR bypass
Same package: tensorflow CVE-2020-15208 9.8 TFLite: OOB read/write via tensor dimension mismatch
Same package: tensorflow CVE-2019-16778 9.8 TensorFlow: heap overflow in UnsortedSegmentSum op
Same package: tensorflow CVE-2022-23587 9.8 TensorFlow: integer overflow in Grappler enables RCE
Same package: tensorflow