CVE-2021-29521: TensorFlow: DoS crash via negative sparse tensor shape

MEDIUM PoC AVAILABLE
Published May 14, 2021
CISO Take

A local attacker or malicious ML workload can crash any TensorFlow process by passing a negative value as dense_shape in SparseCountSparseOutput, causing a segfault with no recovery. Upgrade to TensorFlow 2.5.0, 2.4.2, or 2.3.3 immediately on all training infrastructure and model-serving nodes. Risk is elevated in multi-tenant ML environments (shared notebooks, inference APIs) where untrusted users can submit tensor operations.

Risk Assessment

Medium risk in isolated training environments; elevated in shared or exposed deployments. CVSS 5.5 (Local/Low complexity/Low privilege) understates real-world exposure in ML platforms where notebook users or API callers can invoke raw TF ops. No confidentiality or integrity impact, but availability impact is high — a single malformed tensor call terminates the process. Not in CISA KEV and no evidence of active exploitation, but the exploit primitive is trivially reproducible from the public advisory.

Affected Systems

Package Ecosystem Vulnerable Range Patched
tensorflow pip No patch
195.0K OpenSSF 7.2 3.7K dependents Pushed 6d ago 4% patched ~1372d to patch Full package profile →

Do you use tensorflow? You're affected.

Severity & Risk

CVSS 3.1
5.5 / 10
EPSS
0.0%
chance of exploitation in 30 days
Higher than 1% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

Recommended Action

5 steps
  1. Patch: Upgrade to TensorFlow 2.5.0, 2.4.2 (2.4.x branch), or 2.3.3 (2.3.x branch). Verify with pip show tensorflow.

  2. Input validation: Add explicit checks that all elements of dense_shape tensors are non-negative before passing to SparseCountSparseOutput or equivalent ops.

  3. Isolation: Run TF inference/training processes under process supervisors (systemd, Kubernetes restartPolicy=Always) to auto-recover from crashes.

  4. Least privilege: Restrict which users or API clients can invoke raw TF ops in shared environments.

  5. Detection: Alert on repeated abnormal process terminations of TF serving workers; correlate with input payloads containing negative shape values.

Classification

Compliance Impact

This CVE is relevant to:

ISO 42001
8.2 - AI risk management process
NIST AI RMF
GOVERN 6.1 - Policies and procedures for AI supply chain risk MANAGE 2.2 - Mechanisms to sustain reliable operation
OWASP LLM Top 10
LLM05:2025 - Improper Output Handling / Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-29521?

A local attacker or malicious ML workload can crash any TensorFlow process by passing a negative value as dense_shape in SparseCountSparseOutput, causing a segfault with no recovery. Upgrade to TensorFlow 2.5.0, 2.4.2, or 2.3.3 immediately on all training infrastructure and model-serving nodes. Risk is elevated in multi-tenant ML environments (shared notebooks, inference APIs) where untrusted users can submit tensor operations.

Is CVE-2021-29521 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29521, increasing the risk of exploitation.

How to fix CVE-2021-29521?

1. Patch: Upgrade to TensorFlow 2.5.0, 2.4.2 (2.4.x branch), or 2.3.3 (2.3.x branch). Verify with `pip show tensorflow`. 2. Input validation: Add explicit checks that all elements of dense_shape tensors are non-negative before passing to SparseCountSparseOutput or equivalent ops. 3. Isolation: Run TF inference/training processes under process supervisors (systemd, Kubernetes restartPolicy=Always) to auto-recover from crashes. 4. Least privilege: Restrict which users or API clients can invoke raw TF ops in shared environments. 5. Detection: Alert on repeated abnormal process terminations of TF serving workers; correlate with input payloads containing negative shape values.

What systems are affected by CVE-2021-29521?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, shared ML environments, data preprocessing pipelines.

What is the CVSS score for CVE-2021-29521?

CVE-2021-29521 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.01%.

Technical Details

NVD Description

TensorFlow is an end-to-end open source platform for machine learning. Specifying a negative dense shape in `tf.raw_ops.SparseCountSparseOutput` results in a segmentation fault being thrown out from the standard library as `std::vector` invariants are broken. This is because the implementation(https://github.com/tensorflow/tensorflow/blob/8f7b60ee8c0206a2c99802e3a4d1bb55d2bc0624/tensorflow/core/kernels/count_ops.cc#L199-L213) assumes the first element of the dense shape is always positive and uses it to initialize a `BatchedMap<T>` (i.e., `std::vector<absl::flat_hash_map<int64,T>>`(https://github.com/tensorflow/tensorflow/blob/8f7b60ee8c0206a2c99802e3a4d1bb55d2bc0624/tensorflow/core/kernels/count_ops.cc#L27)) data structure. If the `shape` tensor has more than one element, `num_batches` is the first value in `shape`. Ensuring that the `dense_shape` argument is a valid tensor shape (that is, all elements are non-negative) solves this issue. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2 and TensorFlow 2.3.3.

Exploitation Scenario

An adversary with access to a shared ML notebook environment or a model-serving API that exposes TF raw ops calls `tf.raw_ops.SparseCountSparseOutput(indices=..., values=..., dense_shape=[-1, 10], weights=..., binary_output=False)`. The negative first element (-1) is passed directly as `num_batches` to initialize a `std::vector<absl::flat_hash_map>`, violating vector invariants and triggering a segfault. The TF process crashes immediately with no exception handling possible at the application layer. In a Kubernetes-hosted inference cluster, the attacker can loop this call to repeatedly crash pods faster than they restart, achieving sustained denial of service against the ML serving endpoint.

Weaknesses (CWE)

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities