CVE-2021-29551: TensorFlow: OOB read DoS in MatrixTriangularSolve kernel

MEDIUM PoC AVAILABLE
Published May 14, 2021
CISO Take

A local attacker with minimal privileges can crash TensorFlow processes by triggering an out-of-bounds read in the MatrixTriangularSolve kernel, causing availability loss. Primary risk is in shared ML infrastructure — Jupyter hubs, training clusters, or multi-tenant serving environments where untrusted users can submit TensorFlow operations. Upgrade to TensorFlow 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4.

Risk Assessment

Medium severity with constrained exploitability due to local attack vector requirement. Risk escalates significantly in multi-tenant ML platforms where multiple users or untrusted workloads share TensorFlow runtimes. Low attack complexity means any authenticated local user can trigger it without specialized knowledge. Not remotely exploitable unless TensorFlow operations are exposed via an API accepting user-controlled inputs.

Affected Systems

Package Ecosystem Vulnerable Range Patched
tensorflow pip No patch
195.0K OpenSSF 7.2 3.7K dependents Pushed 6d ago 4% patched ~1372d to patch Full package profile →

Do you use tensorflow? You're affected.

Severity & Risk

CVSS 3.1
5.5 / 10
EPSS
0.0%
chance of exploitation in 30 days
Higher than 1% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Moderate
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

Recommended Action

5 steps
  1. Patch: upgrade to TensorFlow 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4 — all contain the fix (commit 480641e).

  2. Isolate ML workloads with containers and namespace separation on shared platforms.

  3. Apply resource limits (CPU/memory cgroups) to TensorFlow processes to bound crash impact.

  4. Audit pipelines that accept user-supplied model operations or custom kernels.

  5. Detection: monitor for unexpected TF process terminations or repeated job failures involving linalg operations as a signal.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Art. 9 - Risk management system — technical robustness and safety Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2.6 - AI system operation and maintenance A.9.3 - AI system performance and reliability
NIST AI RMF
MANAGE-2.2 - Mechanisms to sustain the value of deployed AI systems RMF.MANAGE-2.2 - Mechanisms for sustaining AI system function during adversarial conditions
OWASP LLM Top 10
LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2021-29551?

A local attacker with minimal privileges can crash TensorFlow processes by triggering an out-of-bounds read in the MatrixTriangularSolve kernel, causing availability loss. Primary risk is in shared ML infrastructure — Jupyter hubs, training clusters, or multi-tenant serving environments where untrusted users can submit TensorFlow operations. Upgrade to TensorFlow 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4.

Is CVE-2021-29551 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29551, increasing the risk of exploitation.

How to fix CVE-2021-29551?

1. Patch: upgrade to TensorFlow 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4 — all contain the fix (commit 480641e). 2. Isolate ML workloads with containers and namespace separation on shared platforms. 3. Apply resource limits (CPU/memory cgroups) to TensorFlow processes to bound crash impact. 4. Audit pipelines that accept user-supplied model operations or custom kernels. 5. Detection: monitor for unexpected TF process terminations or repeated job failures involving linalg operations as a signal.

What systems are affected by CVE-2021-29551?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, notebook environments, shared ML platforms.

What is the CVSS score for CVE-2021-29551?

CVE-2021-29551 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.01%.

Technical Details

NVD Description

TensorFlow is an end-to-end open source platform for machine learning. The implementation of `MatrixTriangularSolve`(https://github.com/tensorflow/tensorflow/blob/8cae746d8449c7dda5298327353d68613f16e798/tensorflow/core/kernels/linalg/matrix_triangular_solve_op_impl.h#L160-L240) fails to terminate kernel execution if one validation condition fails. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with a low-privilege account on a shared ML training cluster submits a job calling MatrixTriangularSolve with crafted inputs that trigger the validation failure. The kernel fails to terminate, causing the TensorFlow process to crash or hang. On a Jupyter Hub or SageMaker-like shared environment, this disrupts colocated training jobs and can be repeated to cause sustained denial of service, forcing job restarts and wasting significant compute resources.

Weaknesses (CWE)

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities