CVE-2021-29612: TensorFlow: heap overflow in linalg op, RCE risk

HIGH PoC AVAILABLE

Published May 14, 2021

CISO Take

Heap buffer overflow in TensorFlow's BandedTriangularSolve kernel allows low-privileged local code execution — full CIA impact. Patch immediately to TF 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4. Shared ML platforms (Jupyter, Kubeflow, MLflow) where users submit arbitrary model code are at highest risk.

What is the risk?

CVSS 7.8 High with local attack vector limits direct internet exposure, but shared ML training infrastructure substantially elevates real-world risk. Attack complexity is low, no user interaction required, and the root cause is a double failure: missing empty-tensor validation AND unchecked OP_REQUIRES status — making exploitation straightforward. No evidence of active exploitation in the wild, but the GitHub advisory includes an exploit reference.

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
TensorFlow	pip	—	No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 2d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1

7.8 / 10

EPSS

0.3%

chance of exploitation in 30 days

Higher than 20% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Moderate

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV Local

AC Low

PR Low

UI None

S Unchanged

C High

I High

A High

What should I do?

1 step

1) Patch: upgrade to TF 2.5.0, or backport releases 2.4.2, 2.3.3, 2.2.3, 2.1.4. 2) Immediate workaround if patching is delayed: restrict access to raw TF ops in multi-tenant environments; validate tensors are non-empty before invoking BandedTriangularSolve. 3) Architecture: sandbox ML workload execution with process isolation (containers, VMs) to limit blast radius. 4) Detection: monitor for anomalous process behavior or unexpected memory errors from ML workers. 5) Inventory all TF versions across training and inference environments — containerized deployments are easy to miss.

How is it classified?

Code Execution Framework Inference AML.T0010.001 - AI Software AML.T0011.001 - Malicious Package AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Art.15 - Accuracy, robustness and cybersecurity

ISO 42001

A.6.1 - AI system lifecycle management

NIST AI RMF

MANAGE-2.2 - Treatments, responses, and prioritization for identified AI risks

OWASP LLM Top 10

LLM05:2025 - Insecure Output Handling / Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-29612?

Is CVE-2021-29612 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29612, increasing the risk of exploitation.

How to fix CVE-2021-29612?

1) Patch: upgrade to TF 2.5.0, or backport releases 2.4.2, 2.3.3, 2.2.3, 2.1.4. 2) Immediate workaround if patching is delayed: restrict access to raw TF ops in multi-tenant environments; validate tensors are non-empty before invoking BandedTriangularSolve. 3) Architecture: sandbox ML workload execution with process isolation (containers, VMs) to limit blast radius. 4) Detection: monitor for anomalous process behavior or unexpected memory errors from ML workers. 5) Inventory all TF versions across training and inference environments — containerized deployments are easy to miss.

What systems are affected by CVE-2021-29612?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, ML platforms, notebook environments.

What is the CVSS score for CVE-2021-29612?

CVE-2021-29612 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.29%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingML platformsnotebook environments

MITRE ATLAS Techniques

AML.T0010.001 AI Software

AML.T0011.001 Malicious Package

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art.15

ISO 42001: A.6.1

NIST AI RMF: MANAGE-2.2

OWASP LLM Top 10: LLM05:2025

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. An attacker can trigger a heap buffer overflow in Eigen implementation of `tf.raw_ops.BandedTriangularSolve`. The implementation(https://github.com/tensorflow/tensorflow/blob/eccb7ec454e6617738554a255d77f08e60ee0808/tensorflow/core/kernels/linalg/banded_triangular_solve_op.cc#L269-L278) calls `ValidateInputTensors` for input validation but fails to validate that the two tensors are not empty. Furthermore, since `OP_REQUIRES` macro only stops execution of current function after setting `ctx->status()` to a non-OK value, callers of helper functions that use `OP_REQUIRES` must check value of `ctx->status()` before continuing. This doesn't happen in this op's implementation(https://github.com/tensorflow/tensorflow/blob/eccb7ec454e6617738554a255d77f08e60ee0808/tensorflow/core/kernels/linalg/banded_triangular_solve_op.cc#L219), hence the validation that is present is also not effective. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with access to a shared ML training platform (internal Jupyter hub, Kubeflow pipeline, or MLflow experiment server) submits a crafted TensorFlow model that invokes tf.raw_ops.BandedTriangularSolve with an empty input tensor. Due to missing empty-tensor validation and unchecked OP_REQUIRES return status, the Eigen implementation proceeds into heap memory, triggering a buffer overflow. On a successful exploit, the attacker gains code execution as the training worker process — which typically has access to cloud storage credentials, training datasets, and network access to internal ML infrastructure.

Weaknesses (CWE)

CWE-787 Out-of-bounds Write Primary CWE-120 Buffer Copy without Checking Size of Input ('Classic Buffer Overflow')

CWE-787 — Out-of-bounds Write: The product writes data past the end, or before the beginning, of the intended buffer.

[Requirements] Use a language that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. For example, many languages that perform their own memory management, such as Java and Perl, are not subject to buffer overflows. Other languages, such as Ada and C#, typically provide overflow protection, but the protection can be disabled by the programmer. Be wary that a language's interface to native code may still be subject to overflows, even if the language itself is theoretically safe.
[Architecture and Design] Use a vetted library or framework that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. Examples include the Safe C String Library (SafeStr) by Messier and Viega [REF-57], and the Strsafe.h library from Microsoft [REF-56]. These libraries provide safer versions of overflow-prone string-handling functions.

Source: MITRE CWE corpus.