CVE-2021-29545: TensorFlow: heap OOB write in sparse tensor DoS

MEDIUM PoC AVAILABLE
Published May 14, 2021
CISO Take

A local attacker with minimal privileges can crash TensorFlow processes by submitting malformed sparse tensors, triggering an out-of-bounds heap write via the sparse-to-CSR matrix conversion kernel. The local-only attack vector limits broad exposure, but multi-tenant ML platforms and shared data science environments (Jupyter hubs, model serving endpoints accepting sparse input) carry real denial-of-service risk. Patch to TensorFlow 2.5.0 or apply the cherry-picked fixes for 2.1.x–2.4.x immediately.

What is the risk?

Medium risk in isolation (CVSS 5.5, AV:L/PR:L), but elevated in shared or multi-tenant ML infrastructure. Exploitability within local scope is trivial—crafting a malformed sparse tensor index requires no deep ML knowledge. Impact is confined to availability; no confidentiality or integrity risk. Organizations running TensorFlow in Jupyter environments, managed notebook services, or model serving APIs that accept sparse matrix inputs should treat this as higher operational priority than the base score suggests.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 2d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
5.5 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 9% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

What should I do?

1 step
  1. 1) Upgrade TensorFlow to 2.5.0, or apply cherry-pick commit 1e922ccdf6bf46a3a52641f99fd47d54c1decd13 to 2.1.4, 2.2.3, 2.3.3, or 2.4.2. 2) As a workaround, validate sparse tensor indices server-side before passing to CSR conversion: reject any input where max(indices[:,0]) >= expected_num_rows. 3) Run TensorFlow serving workers in isolated containers with restart policies to limit DoS impact duration. 4) Alert on unexpected TensorFlow process exits in serving infrastructure as a detection signal. 5) Audit production code for use of tf.raw_ops.SparseToCsrSparseMatrix and related sparse conversion APIs exposed to external input.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2.6 - AI system testing and validation
NIST AI RMF
MANAGE 2.2 - Mechanisms are in place to respond to and recover from AI risks
OWASP LLM Top 10
LLM06:2025 - Excessive Agency / Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-29545?

A local attacker with minimal privileges can crash TensorFlow processes by submitting malformed sparse tensors, triggering an out-of-bounds heap write via the sparse-to-CSR matrix conversion kernel. The local-only attack vector limits broad exposure, but multi-tenant ML platforms and shared data science environments (Jupyter hubs, model serving endpoints accepting sparse input) carry real denial-of-service risk. Patch to TensorFlow 2.5.0 or apply the cherry-picked fixes for 2.1.x–2.4.x immediately.

Is CVE-2021-29545 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29545, increasing the risk of exploitation.

How to fix CVE-2021-29545?

1) Upgrade TensorFlow to 2.5.0, or apply cherry-pick commit 1e922ccdf6bf46a3a52641f99fd47d54c1decd13 to 2.1.4, 2.2.3, 2.3.3, or 2.4.2. 2) As a workaround, validate sparse tensor indices server-side before passing to CSR conversion: reject any input where max(indices[:,0]) >= expected_num_rows. 3) Run TensorFlow serving workers in isolated containers with restart policies to limit DoS impact duration. 4) Alert on unexpected TensorFlow process exits in serving infrastructure as a detection signal. 5) Audit production code for use of tf.raw_ops.SparseToCsrSparseMatrix and related sparse conversion APIs exposed to external input.

What systems are affected by CVE-2021-29545?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, shared ML notebooks.

What is the CVSS score for CVE-2021-29545?

CVE-2021-29545 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.19%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingshared ML notebooks

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0029 Denial of AI Service
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: A.6.2.6
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM06:2025

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. An attacker can trigger a denial of service via a `CHECK`-fail in converting sparse tensors to CSR Sparse matrices. This is because the implementation(https://github.com/tensorflow/tensorflow/blob/800346f2c03a27e182dd4fba48295f65e7790739/tensorflow/core/kernels/sparse/kernels.cc#L66) does a double redirection to access an element of an array allocated on the heap. If the value at `indices(i, 0)` is such that `indices(i, 0) + 1` is outside the bounds of `csr_row_ptr`, this results in writing outside of bounds of heap allocated data. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with access to a shared ML platform—such as a tenant on a multi-user Jupyter hub or a client of a model serving endpoint that accepts sparse feature inputs—constructs a sparse tensor where indices(i, 0) + 1 exceeds the allocated bounds of the csr_row_ptr array. Submitting this tensor to any operation invoking the SparseToCsrSparseMatrix kernel triggers the out-of-bounds heap write, causing an immediate CHECK-fail and process crash. In a model serving context (e.g., TensorFlow Serving behind an API), the attacker can repeatedly submit these payloads to keep the inference worker down, achieving sustained denial of service against the ML endpoint with no authentication bypass required beyond API access.

Weaknesses (CWE)

CWE-131 — Incorrect Calculation of Buffer Size: The product does not correctly calculate the size to be used when allocating a buffer, which could lead to a buffer overflow.

  • [Implementation] When allocating a buffer for the purpose of transforming, converting, or encoding an input, allocate enough memory to handle the largest possible encoding. For example, in a routine that converts "&" characters to "&" for HTML entity encoding, the output buffer needs to be at least 5 times as large as the input buffer.
  • [Implementation] Understand the programming language's underlying representation and how it interacts with numeric calculation (CWE-681). Pay close attention to byte size discrepancies, precision, signed/unsigned distinctions, truncation, conversion and casting between types, "not-a-number" calculations, and how the language handles numbers that are too large or too small for its underlying representation. [REF-7] Also be careful to account for 32-bit, 64-bit, and other potential differences that may affect the numeric representation.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities