CVE-2021-29521: TensorFlow: DoS crash via negative sparse tensor shape

MEDIUM PoC AVAILABLE
Published May 14, 2021
CISO Take

A local attacker or malicious ML workload can crash any TensorFlow process by passing a negative value as dense_shape in SparseCountSparseOutput, causing a segfault with no recovery. Upgrade to TensorFlow 2.5.0, 2.4.2, or 2.3.3 immediately on all training infrastructure and model-serving nodes. Risk is elevated in multi-tenant ML environments (shared notebooks, inference APIs) where untrusted users can submit tensor operations.

What is the risk?

Medium risk in isolated training environments; elevated in shared or exposed deployments. CVSS 5.5 (Local/Low complexity/Low privilege) understates real-world exposure in ML platforms where notebook users or API callers can invoke raw TF ops. No confidentiality or integrity impact, but availability impact is high — a single malformed tensor call terminates the process. Not in CISA KEV and no evidence of active exploitation, but the exploit primitive is trivially reproducible from the public advisory.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
5.5 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 9% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

What should I do?

5 steps
  1. Patch: Upgrade to TensorFlow 2.5.0, 2.4.2 (2.4.x branch), or 2.3.3 (2.3.x branch). Verify with pip show tensorflow.

  2. Input validation: Add explicit checks that all elements of dense_shape tensors are non-negative before passing to SparseCountSparseOutput or equivalent ops.

  3. Isolation: Run TF inference/training processes under process supervisors (systemd, Kubernetes restartPolicy=Always) to auto-recover from crashes.

  4. Least privilege: Restrict which users or API clients can invoke raw TF ops in shared environments.

  5. Detection: Alert on repeated abnormal process terminations of TF serving workers; correlate with input payloads containing negative shape values.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

ISO 42001
8.2 - AI risk management process
NIST AI RMF
GOVERN 6.1 - Policies and procedures for AI supply chain risk MANAGE 2.2 - Mechanisms to sustain reliable operation
OWASP LLM Top 10
LLM05:2025 - Improper Output Handling / Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-29521?

A local attacker or malicious ML workload can crash any TensorFlow process by passing a negative value as dense_shape in SparseCountSparseOutput, causing a segfault with no recovery. Upgrade to TensorFlow 2.5.0, 2.4.2, or 2.3.3 immediately on all training infrastructure and model-serving nodes. Risk is elevated in multi-tenant ML environments (shared notebooks, inference APIs) where untrusted users can submit tensor operations.

Is CVE-2021-29521 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29521, increasing the risk of exploitation.

How to fix CVE-2021-29521?

1. Patch: Upgrade to TensorFlow 2.5.0, 2.4.2 (2.4.x branch), or 2.3.3 (2.3.x branch). Verify with `pip show tensorflow`. 2. Input validation: Add explicit checks that all elements of dense_shape tensors are non-negative before passing to SparseCountSparseOutput or equivalent ops. 3. Isolation: Run TF inference/training processes under process supervisors (systemd, Kubernetes restartPolicy=Always) to auto-recover from crashes. 4. Least privilege: Restrict which users or API clients can invoke raw TF ops in shared environments. 5. Detection: Alert on repeated abnormal process terminations of TF serving workers; correlate with input payloads containing negative shape values.

What systems are affected by CVE-2021-29521?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, shared ML environments, data preprocessing pipelines.

What is the CVSS score for CVE-2021-29521?

CVE-2021-29521 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.19%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingshared ML environmentsdata preprocessing pipelines

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0029 Denial of AI Service
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

ISO 42001: 8.2
NIST AI RMF: GOVERN 6.1, MANAGE 2.2
OWASP LLM Top 10: LLM05:2025

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. Specifying a negative dense shape in `tf.raw_ops.SparseCountSparseOutput` results in a segmentation fault being thrown out from the standard library as `std::vector` invariants are broken. This is because the implementation(https://github.com/tensorflow/tensorflow/blob/8f7b60ee8c0206a2c99802e3a4d1bb55d2bc0624/tensorflow/core/kernels/count_ops.cc#L199-L213) assumes the first element of the dense shape is always positive and uses it to initialize a `BatchedMap<T>` (i.e., `std::vector<absl::flat_hash_map<int64,T>>`(https://github.com/tensorflow/tensorflow/blob/8f7b60ee8c0206a2c99802e3a4d1bb55d2bc0624/tensorflow/core/kernels/count_ops.cc#L27)) data structure. If the `shape` tensor has more than one element, `num_batches` is the first value in `shape`. Ensuring that the `dense_shape` argument is a valid tensor shape (that is, all elements are non-negative) solves this issue. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2 and TensorFlow 2.3.3.

Exploitation Scenario

An adversary with access to a shared ML notebook environment or a model-serving API that exposes TF raw ops calls `tf.raw_ops.SparseCountSparseOutput(indices=..., values=..., dense_shape=[-1, 10], weights=..., binary_output=False)`. The negative first element (-1) is passed directly as `num_batches` to initialize a `std::vector<absl::flat_hash_map>`, violating vector invariants and triggering a segfault. The TF process crashes immediately with no exception handling possible at the application layer. In a Kubernetes-hosted inference cluster, the attacker can loop this call to repeatedly crash pods faster than they restart, achieving sustained denial of service against the ML serving endpoint.

Weaknesses (CWE)

CWE-131 — Incorrect Calculation of Buffer Size: The product does not correctly calculate the size to be used when allocating a buffer, which could lead to a buffer overflow.

  • [Implementation] When allocating a buffer for the purpose of transforming, converting, or encoding an input, allocate enough memory to handle the largest possible encoding. For example, in a routine that converts "&" characters to "&amp;" for HTML entity encoding, the output buffer needs to be at least 5 times as large as the input buffer.
  • [Implementation] Understand the programming language's underlying representation and how it interacts with numeric calculation (CWE-681). Pay close attention to byte size discrepancies, precision, signed/unsigned distinctions, truncation, conversion and casting between types, "not-a-number" calculations, and how the language handles numbers that are too large or too small for its underlying representation. [REF-7] Also be careful to account for 32-bit, 64-bit, and other potential differences that may affect the numeric representation.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities