CVE-2021-29557: TensorFlow: FPE in SparseMatMul causes process DoS

MEDIUM PoC AVAILABLE
Published May 14, 2021
CISO Take

A divide-by-zero in TensorFlow's SparseMatMul op allows any local user with low privileges to crash TensorFlow processes by passing an empty tensor — no special knowledge required. Risk is confined to availability; no data exfiltration or code execution path exists. Patch immediately to TF 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4 and enforce input tensor validation at pipeline boundaries.

Risk Assessment

Medium-low operational risk. CVSS 5.5 reflects local access requirement and no confidentiality or integrity impact. Exploitability is trivial once the attacker has local execution (e.g., shared Jupyter, multi-tenant GPU cluster, CI/CD runner). Primary concern is in shared ML infrastructure where a single crash disrupts multiple users' training jobs or inference services. No evidence of active exploitation or KEV listing. Patched versions have been available since May 2021, so unpatched instances signal poor patch hygiene rather than a zero-day exposure.

Affected Systems

Package Ecosystem Vulnerable Range Patched
tensorflow pip No patch
195.0K OpenSSF 7.2 3.7K dependents Pushed 6d ago 4% patched ~1372d to patch Full package profile →

Do you use tensorflow? You're affected.

Severity & Risk

CVSS 3.1
5.5 / 10
EPSS
0.0%
chance of exploitation in 30 days
Higher than 1% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

Recommended Action

5 steps
  1. Patch: Upgrade to TensorFlow 2.5.0 or cherry-pick versions 2.4.2 / 2.3.3 / 2.2.3 / 2.1.4 (commit 7f283ff).

  2. Input validation: Add shape assertions before any SparseMatMul call — reject empty tensors at model entry points.

  3. Tenant isolation: In shared ML platforms, run each user's TF session in isolated processes/containers so a crash does not propagate.

  4. Monitoring: Alert on abnormal TF process terminations (SIGFPE / exit code 8) in training and serving infrastructure.

  5. Inventory: Audit all TF versions deployed across training clusters, CI pipelines, and inference servers — this includes containerized model servers (TF Serving, BentoML, Seldon) built on affected base images.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2.5 - AI system availability and resilience
NIST AI RMF
MANAGE 2.2 - Mechanisms to sustain AI risk management
OWASP LLM Top 10
LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2021-29557?

A divide-by-zero in TensorFlow's SparseMatMul op allows any local user with low privileges to crash TensorFlow processes by passing an empty tensor — no special knowledge required. Risk is confined to availability; no data exfiltration or code execution path exists. Patch immediately to TF 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4 and enforce input tensor validation at pipeline boundaries.

Is CVE-2021-29557 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29557, increasing the risk of exploitation.

How to fix CVE-2021-29557?

1. Patch: Upgrade to TensorFlow 2.5.0 or cherry-pick versions 2.4.2 / 2.3.3 / 2.2.3 / 2.1.4 (commit 7f283ff). 2. Input validation: Add shape assertions before any SparseMatMul call — reject empty tensors at model entry points. 3. Tenant isolation: In shared ML platforms, run each user's TF session in isolated processes/containers so a crash does not propagate. 4. Monitoring: Alert on abnormal TF process terminations (SIGFPE / exit code 8) in training and serving infrastructure. 5. Inventory: Audit all TF versions deployed across training clusters, CI pipelines, and inference servers — this includes containerized model servers (TF Serving, BentoML, Seldon) built on affected base images.

What systems are affected by CVE-2021-29557?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, ML development environments, distributed training infrastructure.

What is the CVSS score for CVE-2021-29557?

CVE-2021-29557 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.01%.

Technical Details

NVD Description

TensorFlow is an end-to-end open source platform for machine learning. An attacker can cause a denial of service via a FPE runtime error in `tf.raw_ops.SparseMatMul`. The division by 0 occurs deep in Eigen code because the `b` tensor is empty. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

A malicious insider or compromised ML developer account on a shared GPU training cluster submits a crafted TensorFlow script invoking `tf.raw_ops.SparseMatMul` with an intentionally empty `b` tensor. The Eigen backend performs a division by zero, raising SIGFPE and killing the TensorFlow worker process. In a distributed training job (tf.distribute.MirroredStrategy), this terminates the coordinator, aborting the entire multi-GPU training run and potentially corrupting checkpoint state. On a shared notebook server, the crash terminates the kernel for all users in the same runtime pod. Attacker only needs low-privilege code execution — a shared Jupyter login or a poisoned notebook file is sufficient.

Weaknesses (CWE)

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities