CVE-2021-29612: TensorFlow: heap overflow in linalg op, RCE risk
HIGH PoC AVAILABLEHeap buffer overflow in TensorFlow's BandedTriangularSolve kernel allows low-privileged local code execution — full CIA impact. Patch immediately to TF 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4. Shared ML platforms (Jupyter, Kubeflow, MLflow) where users submit arbitrary model code are at highest risk.
What is the risk?
CVSS 7.8 High with local attack vector limits direct internet exposure, but shared ML training infrastructure substantially elevates real-world risk. Attack complexity is low, no user interaction required, and the root cause is a double failure: missing empty-tensor validation AND unchecked OP_REQUIRES status — making exploitation straightforward. No evidence of active exploitation in the wild, but the GitHub advisory includes an exploit reference.
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| TensorFlow | pip | — | No patch |
Do you use TensorFlow? You're affected.
How severe is it?
What is the attack surface?
What should I do?
1 step-
1) Patch: upgrade to TF 2.5.0, or backport releases 2.4.2, 2.3.3, 2.2.3, 2.1.4. 2) Immediate workaround if patching is delayed: restrict access to raw TF ops in multi-tenant environments; validate tensors are non-empty before invoking BandedTriangularSolve. 3) Architecture: sandbox ML workload execution with process isolation (containers, VMs) to limit blast radius. 4) Detection: monitor for anomalous process behavior or unexpected memory errors from ML workers. 5) Inventory all TF versions across training and inference environments — containerized deployments are easy to miss.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2021-29612?
Heap buffer overflow in TensorFlow's BandedTriangularSolve kernel allows low-privileged local code execution — full CIA impact. Patch immediately to TF 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4. Shared ML platforms (Jupyter, Kubeflow, MLflow) where users submit arbitrary model code are at highest risk.
Is CVE-2021-29612 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2021-29612, increasing the risk of exploitation.
How to fix CVE-2021-29612?
1) Patch: upgrade to TF 2.5.0, or backport releases 2.4.2, 2.3.3, 2.2.3, 2.1.4. 2) Immediate workaround if patching is delayed: restrict access to raw TF ops in multi-tenant environments; validate tensors are non-empty before invoking BandedTriangularSolve. 3) Architecture: sandbox ML workload execution with process isolation (containers, VMs) to limit blast radius. 4) Detection: monitor for anomalous process behavior or unexpected memory errors from ML workers. 5) Inventory all TF versions across training and inference environments — containerized deployments are easy to miss.
What systems are affected by CVE-2021-29612?
This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, ML platforms, notebook environments.
What is the CVSS score for CVE-2021-29612?
CVE-2021-29612 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.29%.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0010.001 AI Software AML.T0011.001 Malicious Package AML.T0049 Exploit Public-Facing Application Compliance Controls Affected
What are the technical details?
Original Advisory
TensorFlow is an end-to-end open source platform for machine learning. An attacker can trigger a heap buffer overflow in Eigen implementation of `tf.raw_ops.BandedTriangularSolve`. The implementation(https://github.com/tensorflow/tensorflow/blob/eccb7ec454e6617738554a255d77f08e60ee0808/tensorflow/core/kernels/linalg/banded_triangular_solve_op.cc#L269-L278) calls `ValidateInputTensors` for input validation but fails to validate that the two tensors are not empty. Furthermore, since `OP_REQUIRES` macro only stops execution of current function after setting `ctx->status()` to a non-OK value, callers of helper functions that use `OP_REQUIRES` must check value of `ctx->status()` before continuing. This doesn't happen in this op's implementation(https://github.com/tensorflow/tensorflow/blob/eccb7ec454e6617738554a255d77f08e60ee0808/tensorflow/core/kernels/linalg/banded_triangular_solve_op.cc#L219), hence the validation that is present is also not effective. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.
Exploitation Scenario
An adversary with access to a shared ML training platform (internal Jupyter hub, Kubeflow pipeline, or MLflow experiment server) submits a crafted TensorFlow model that invokes tf.raw_ops.BandedTriangularSolve with an empty input tensor. Due to missing empty-tensor validation and unchecked OP_REQUIRES return status, the Eigen implementation proceeds into heap memory, triggering a buffer overflow. On a successful exploit, the attacker gains code execution as the training worker process — which typically has access to cloud storage credentials, training datasets, and network access to internal ML infrastructure.
Weaknesses (CWE)
CWE-787 Out-of-bounds Write
Primary
CWE-120 Buffer Copy without Checking Size of Input ('Classic Buffer Overflow') CWE-787 — Out-of-bounds Write: The product writes data past the end, or before the beginning, of the intended buffer.
- [Requirements] Use a language that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. For example, many languages that perform their own memory management, such as Java and Perl, are not subject to buffer overflows. Other languages, such as Ada and C#, typically provide overflow protection, but the protection can be disabled by the programmer. Be wary that a language's interface to native code may still be subject to overflows, even if the language itself is theoretically safe.
- [Architecture and Design] Use a vetted library or framework that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. Examples include the Safe C String Library (SafeStr) by Messier and Viega [REF-57], and the Strsafe.h library from Microsoft [REF-56]. These libraries provide safer versions of overflow-prone string-handling functions.
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H References
- github.com/tensorflow/tensorflow/commit/0ab290774f91a23bebe30a358fde4e53ab4876a0 Patch 3rd Party
- github.com/tensorflow/tensorflow/commit/ba6822bd7b7324ba201a28b2f278c29a98edbef2 Patch 3rd Party
- github.com/tensorflow/tensorflow/security/advisories/GHSA-2xgj-xhgf-ggjv Exploit Patch 3rd Party
Timeline
Related Vulnerabilities
CVE-2020-15196 9.9 TensorFlow: heap OOB read in sparse/ragged count ops
Same package: tensorflow CVE-2020-15205 9.8 TensorFlow: heap overflow in StringNGrams, ASLR bypass
Same package: tensorflow CVE-2020-15208 9.8 TFLite: OOB read/write via tensor dimension mismatch
Same package: tensorflow CVE-2019-16778 9.8 TensorFlow: heap overflow in UnsortedSegmentSum op
Same package: tensorflow CVE-2022-23587 9.8 TensorFlow: integer overflow in Grappler enables RCE
Same package: tensorflow