CVE-2021-29557: TensorFlow: FPE in SparseMatMul causes process DoS
MEDIUM PoC AVAILABLEA divide-by-zero in TensorFlow's SparseMatMul op allows any local user with low privileges to crash TensorFlow processes by passing an empty tensor — no special knowledge required. Risk is confined to availability; no data exfiltration or code execution path exists. Patch immediately to TF 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4 and enforce input tensor validation at pipeline boundaries.
Risk Assessment
Medium-low operational risk. CVSS 5.5 reflects local access requirement and no confidentiality or integrity impact. Exploitability is trivial once the attacker has local execution (e.g., shared Jupyter, multi-tenant GPU cluster, CI/CD runner). Primary concern is in shared ML infrastructure where a single crash disrupts multiple users' training jobs or inference services. No evidence of active exploitation or KEV listing. Patched versions have been available since May 2021, so unpatched instances signal poor patch hygiene rather than a zero-day exposure.
Affected Systems
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| tensorflow | pip | — | No patch |
Do you use tensorflow? You're affected.
Severity & Risk
Attack Surface
Recommended Action
5 steps-
Patch: Upgrade to TensorFlow 2.5.0 or cherry-pick versions 2.4.2 / 2.3.3 / 2.2.3 / 2.1.4 (commit 7f283ff).
-
Input validation: Add shape assertions before any SparseMatMul call — reject empty tensors at model entry points.
-
Tenant isolation: In shared ML platforms, run each user's TF session in isolated processes/containers so a crash does not propagate.
-
Monitoring: Alert on abnormal TF process terminations (SIGFPE / exit code 8) in training and serving infrastructure.
-
Inventory: Audit all TF versions deployed across training clusters, CI pipelines, and inference servers — this includes containerized model servers (TF Serving, BentoML, Seldon) built on affected base images.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2021-29557?
A divide-by-zero in TensorFlow's SparseMatMul op allows any local user with low privileges to crash TensorFlow processes by passing an empty tensor — no special knowledge required. Risk is confined to availability; no data exfiltration or code execution path exists. Patch immediately to TF 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4 and enforce input tensor validation at pipeline boundaries.
Is CVE-2021-29557 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2021-29557, increasing the risk of exploitation.
How to fix CVE-2021-29557?
1. Patch: Upgrade to TensorFlow 2.5.0 or cherry-pick versions 2.4.2 / 2.3.3 / 2.2.3 / 2.1.4 (commit 7f283ff). 2. Input validation: Add shape assertions before any SparseMatMul call — reject empty tensors at model entry points. 3. Tenant isolation: In shared ML platforms, run each user's TF session in isolated processes/containers so a crash does not propagate. 4. Monitoring: Alert on abnormal TF process terminations (SIGFPE / exit code 8) in training and serving infrastructure. 5. Inventory: Audit all TF versions deployed across training clusters, CI pipelines, and inference servers — this includes containerized model servers (TF Serving, BentoML, Seldon) built on affected base images.
What systems are affected by CVE-2021-29557?
This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, ML development environments, distributed training infrastructure.
What is the CVSS score for CVE-2021-29557?
CVE-2021-29557 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.01%.
Technical Details
NVD Description
TensorFlow is an end-to-end open source platform for machine learning. An attacker can cause a denial of service via a FPE runtime error in `tf.raw_ops.SparseMatMul`. The division by 0 occurs deep in Eigen code because the `b` tensor is empty. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.
Exploitation Scenario
A malicious insider or compromised ML developer account on a shared GPU training cluster submits a crafted TensorFlow script invoking `tf.raw_ops.SparseMatMul` with an intentionally empty `b` tensor. The Eigen backend performs a division by zero, raising SIGFPE and killing the TensorFlow worker process. In a distributed training job (tf.distribute.MirroredStrategy), this terminates the coordinator, aborting the entire multi-GPU training run and potentially corrupting checkpoint state. On a shared notebook server, the crash terminates the kernel for all users in the same runtime pod. Attacker only needs low-privilege code execution — a shared Jupyter login or a poisoned notebook file is sufficient.
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H References
- github.com/tensorflow/tensorflow/commit/7f283ff806b2031f407db64c4d3edcda8fb9f9f5 Patch 3rd Party
- github.com/tensorflow/tensorflow/security/advisories/GHSA-xw93-v57j-fcgh Exploit Patch 3rd Party
Timeline
Related Vulnerabilities
CVE-2020-15196 9.9 TensorFlow: heap OOB read in sparse/ragged count ops
Same package: tensorflow CVE-2020-15205 9.8 TensorFlow: heap overflow in StringNGrams, ASLR bypass
Same package: tensorflow CVE-2020-15208 9.8 TFLite: OOB read/write via tensor dimension mismatch
Same package: tensorflow CVE-2019-16778 9.8 TensorFlow: heap overflow in UnsortedSegmentSum op
Same package: tensorflow CVE-2022-23587 9.8 TensorFlow: integer overflow in Grappler enables RCE
Same package: tensorflow
AI Threat Alert