CVE-2021-29551: TensorFlow OOB read DoS

CISO Take

A local attacker with minimal privileges can crash TensorFlow processes by triggering an out-of-bounds read in the MatrixTriangularSolve kernel, causing availability loss. Primary risk is in shared ML infrastructure — Jupyter hubs, training clusters, or multi-tenant serving environments where untrusted users can submit TensorFlow operations. Upgrade to TensorFlow 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4.

What is the risk?

Medium severity with constrained exploitability due to local attack vector requirement. Risk escalates significantly in multi-tenant ML platforms where multiple users or untrusted workloads share TensorFlow runtimes. Low attack complexity means any authenticated local user can trigger it without specialized knowledge. Not remotely exploitable unless TensorFlow operations are exposed via an API accepting user-controlled inputs.

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
TensorFlow	pip	—	No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 2d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1

5.5 / 10

EPSS

0.2%

chance of exploitation in 30 days

Higher than 12% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Moderate

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV Local

AC Low

PR Low

UI None

S Unchanged

C None

I None

A High

What should I do?

5 steps

Patch: upgrade to TensorFlow 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4 — all contain the fix (commit 480641e).
Isolate ML workloads with containers and namespace separation on shared platforms.
Apply resource limits (CPU/memory cgroups) to TensorFlow processes to bound crash impact.
Audit pipelines that accept user-supplied model operations or custom kernels.
Detection: monitor for unexpected TF process terminations or repeated job failures involving linalg operations as a signal.

How is it classified?

DoS Framework Inference AML.T0010.001 - AI Software AML.T0029 - Denial of AI Service AML.T0034 - Cost Harvesting AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Art. 9 - Risk management system — technical robustness and safety Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.6.2.6 - AI system operation and maintenance A.9.3 - AI system performance and reliability

NIST AI RMF

MANAGE-2.2 - Mechanisms to sustain the value of deployed AI systems RMF.MANAGE-2.2 - Mechanisms for sustaining AI system function during adversarial conditions

OWASP LLM Top 10

LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2021-29551?

A local attacker with minimal privileges can crash TensorFlow processes by triggering an out-of-bounds read in the MatrixTriangularSolve kernel, causing availability loss. Primary risk is in shared ML infrastructure — Jupyter hubs, training clusters, or multi-tenant serving environments where untrusted users can submit TensorFlow operations. Upgrade to TensorFlow 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4.

Is CVE-2021-29551 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29551, increasing the risk of exploitation.

How to fix CVE-2021-29551?

1. Patch: upgrade to TensorFlow 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4 — all contain the fix (commit 480641e). 2. Isolate ML workloads with containers and namespace separation on shared platforms. 3. Apply resource limits (CPU/memory cgroups) to TensorFlow processes to bound crash impact. 4. Audit pipelines that accept user-supplied model operations or custom kernels. 5. Detection: monitor for unexpected TF process terminations or repeated job failures involving linalg operations as a signal.

What systems are affected by CVE-2021-29551?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, notebook environments, shared ML platforms.

What is the CVSS score for CVE-2021-29551?

CVE-2021-29551 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.22%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingnotebook environmentsshared ML platforms

MITRE ATLAS Techniques

AML.T0010.001 AI Software

AML.T0029 Denial of AI Service

AML.T0034 Cost Harvesting

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art. 9, Article 15

ISO 42001: A.6.2.6, A.9.3

NIST AI RMF: MANAGE-2.2, RMF.MANAGE-2.2

OWASP LLM Top 10: LLM04

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. The implementation of `MatrixTriangularSolve`(https://github.com/tensorflow/tensorflow/blob/8cae746d8449c7dda5298327353d68613f16e798/tensorflow/core/kernels/linalg/matrix_triangular_solve_op_impl.h#L160-L240) fails to terminate kernel execution if one validation condition fails. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with a low-privilege account on a shared ML training cluster submits a job calling MatrixTriangularSolve with crafted inputs that trigger the validation failure. The kernel fails to terminate, causing the TensorFlow process to crash or hang. On a Jupyter Hub or SageMaker-like shared environment, this disrupts colocated training jobs and can be repeated to cause sustained denial of service, forcing job restarts and wasting significant compute resources.

Weaknesses (CWE)

CWE-125 Out-of-bounds Read

CWE-125 — Out-of-bounds Read: The product reads data past the end, or before the beginning, of the intended buffer.

[Implementation] Assume all input is malicious. Use an "accept known good" input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does. When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, "boat" may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as "red" or "blue." Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code's environment changes. This can give attackers enough room to bypass the intended validation. However, denylis
[Architecture and Design] Use a language that provides appropriate memory abstractions.

Source: MITRE CWE corpus.