CVE-2021-29557: TensorFlow FPE — MEDIUM

CISO Take

A divide-by-zero in TensorFlow's SparseMatMul op allows any local user with low privileges to crash TensorFlow processes by passing an empty tensor — no special knowledge required. Risk is confined to availability; no data exfiltration or code execution path exists. Patch immediately to TF 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4 and enforce input tensor validation at pipeline boundaries.

What is the risk?

Medium-low operational risk. CVSS 5.5 reflects local access requirement and no confidentiality or integrity impact. Exploitability is trivial once the attacker has local execution (e.g., shared Jupyter, multi-tenant GPU cluster, CI/CD runner). Primary concern is in shared ML infrastructure where a single crash disrupts multiple users' training jobs or inference services. No evidence of active exploitation or KEV listing. Patched versions have been available since May 2021, so unpatched instances signal poor patch hygiene rather than a zero-day exposure.

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
TensorFlow	pip	—	No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 2d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1

5.5 / 10

EPSS

0.2%

chance of exploitation in 30 days

Higher than 9% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Trivial

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV Local

AC Low

PR Low

UI None

S Unchanged

C None

I None

A High

What should I do?

5 steps

Patch: Upgrade to TensorFlow 2.5.0 or cherry-pick versions 2.4.2 / 2.3.3 / 2.2.3 / 2.1.4 (commit 7f283ff).
Input validation: Add shape assertions before any SparseMatMul call — reject empty tensors at model entry points.
Tenant isolation: In shared ML platforms, run each user's TF session in isolated processes/containers so a crash does not propagate.
Monitoring: Alert on abnormal TF process terminations (SIGFPE / exit code 8) in training and serving infrastructure.
Inventory: Audit all TF versions deployed across training clusters, CI pipelines, and inference servers — this includes containerized model servers (TF Serving, BentoML, Seldon) built on affected base images.

How is it classified?

DoS Framework Inference AML.T0010.001 - AI Software AML.T0029 - Denial of AI Service AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.6.2.5 - AI system availability and resilience

NIST AI RMF

MANAGE 2.2 - Mechanisms to sustain AI risk management

OWASP LLM Top 10

LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2021-29557?

A divide-by-zero in TensorFlow's SparseMatMul op allows any local user with low privileges to crash TensorFlow processes by passing an empty tensor — no special knowledge required. Risk is confined to availability; no data exfiltration or code execution path exists. Patch immediately to TF 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4 and enforce input tensor validation at pipeline boundaries.

Is CVE-2021-29557 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29557, increasing the risk of exploitation.

How to fix CVE-2021-29557?

1. Patch: Upgrade to TensorFlow 2.5.0 or cherry-pick versions 2.4.2 / 2.3.3 / 2.2.3 / 2.1.4 (commit 7f283ff). 2. Input validation: Add shape assertions before any SparseMatMul call — reject empty tensors at model entry points. 3. Tenant isolation: In shared ML platforms, run each user's TF session in isolated processes/containers so a crash does not propagate. 4. Monitoring: Alert on abnormal TF process terminations (SIGFPE / exit code 8) in training and serving infrastructure. 5. Inventory: Audit all TF versions deployed across training clusters, CI pipelines, and inference servers — this includes containerized model servers (TF Serving, BentoML, Seldon) built on affected base images.

What systems are affected by CVE-2021-29557?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, ML development environments, distributed training infrastructure.

What is the CVSS score for CVE-2021-29557?

CVE-2021-29557 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.19%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingML development environmentsdistributed training infrastructure

MITRE ATLAS Techniques

AML.T0010.001 AI Software

AML.T0029 Denial of AI Service

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15

ISO 42001: A.6.2.5

NIST AI RMF: MANAGE 2.2

OWASP LLM Top 10: LLM04

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. An attacker can cause a denial of service via a FPE runtime error in `tf.raw_ops.SparseMatMul`. The division by 0 occurs deep in Eigen code because the `b` tensor is empty. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

A malicious insider or compromised ML developer account on a shared GPU training cluster submits a crafted TensorFlow script invoking `tf.raw_ops.SparseMatMul` with an intentionally empty `b` tensor. The Eigen backend performs a division by zero, raising SIGFPE and killing the TensorFlow worker process. In a distributed training job (tf.distribute.MirroredStrategy), this terminates the coordinator, aborting the entire multi-GPU training run and potentially corrupting checkpoint state. On a shared notebook server, the crash terminates the kernel for all users in the same runtime pod. Attacker only needs low-privilege code execution — a shared Jupyter login or a poisoned notebook file is sufficient.