CVE-2021-29524: TensorFlow div-by-zero DoS

CISO Take

A local attacker with low privileges can crash any TensorFlow process by passing a zero-divisor to the Conv2DBackpropFilter raw op, causing a divide-by-zero and process termination. Primary risk is disruption of CNN training jobs in shared ML environments or multi-tenant training platforms. Patch to TF 2.5.0 / 2.4.2 / 2.3.3 / 2.2.3 / 2.1.4 immediately; no workaround exists short of blocking access to raw ops.

What is the risk?

MEDIUM-LOW in isolated environments, MEDIUM in shared or multi-tenant ML infrastructure. CVSS 5.5 reflects local vector, but in practice any user with access to the TensorFlow runtime can terminate training jobs deterministically. No confidentiality or integrity impact, but availability impact is HIGH — a single malformed op call kills the process. Risk elevates in Jupyter/notebook environments, shared HPC clusters, and MLaaS platforms where multiple teams share a TF runtime.

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
TensorFlow	pip	—	No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1

5.5 / 10

EPSS

0.2%

chance of exploitation in 30 days

Higher than 9% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Trivial

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV Local

AC Low

PR Low

UI None

S Unchanged

C None

I None

A High

What should I do?

5 steps

PATCH

Upgrade to TensorFlow 2.5.0, or apply backports 2.4.2 / 2.3.3 / 2.2.3 / 2.1.4. Commit fca9874 resolves the issue.
RESTRICT

Audit who can submit arbitrary tf.raw_ops calls in your environment — restrict to trusted identities.
ISOLATE

Run training jobs in separate containers/processes per user/team to limit blast radius.
DETECT

Monitor for unexpected TF process crashes or SIGFPE signals in training infrastructure logs.
HARDEN

Disable direct tf.raw_ops exposure in any public-facing model-serving API.

How is it classified?

DoS Framework Training Data AML.T0010.001 - AI Software AML.T0029 - Denial of AI Service AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.6.2.6 - AI system robustness and security

NIST AI RMF

GOVERN 1.7 - Processes for decommissioning and patching AI systems MANAGE 2.2 - Residual risk tracking and treatment

Frequently Asked Questions

What is CVE-2021-29524?

A local attacker with low privileges can crash any TensorFlow process by passing a zero-divisor to the Conv2DBackpropFilter raw op, causing a divide-by-zero and process termination. Primary risk is disruption of CNN training jobs in shared ML environments or multi-tenant training platforms. Patch to TF 2.5.0 / 2.4.2 / 2.3.3 / 2.2.3 / 2.1.4 immediately; no workaround exists short of blocking access to raw ops.

Is CVE-2021-29524 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29524, increasing the risk of exploitation.

How to fix CVE-2021-29524?

1. PATCH: Upgrade to TensorFlow 2.5.0, or apply backports 2.4.2 / 2.3.3 / 2.2.3 / 2.1.4. Commit fca9874 resolves the issue. 2. RESTRICT: Audit who can submit arbitrary tf.raw_ops calls in your environment — restrict to trusted identities. 3. ISOLATE: Run training jobs in separate containers/processes per user/team to limit blast radius. 4. DETECT: Monitor for unexpected TF process crashes or SIGFPE signals in training infrastructure logs. 5. HARDEN: Disable direct tf.raw_ops exposure in any public-facing model-serving API.

What systems are affected by CVE-2021-29524?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, ML development environments, shared compute clusters.

What is the CVSS score for CVE-2021-29524?

CVE-2021-29524 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.19%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingML development environmentsshared compute clusters

MITRE ATLAS Techniques

AML.T0010.001 AI Software

AML.T0029 Denial of AI Service

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15

ISO 42001: A.6.2.6

NIST AI RMF: GOVERN 1.7, MANAGE 2.2

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. An attacker can trigger a division by 0 in `tf.raw_ops.Conv2DBackpropFilter`. This is because the implementation(https://github.com/tensorflow/tensorflow/blob/496c2630e51c1a478f095b084329acedb253db6b/tensorflow/core/kernels/conv_grad_shape_utils.cc#L130) does a modulus operation where the divisor is controlled by the caller. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

A malicious insider or attacker who has compromised a shared ML development environment calls tf.raw_ops.Conv2DBackpropFilter() with input shape parameters engineered to produce a zero-valued modulus divisor. The kernel performs the modulus without validation, triggers a SIGFPE/division-by-zero exception, and kills the TensorFlow process. In a shared Jupyter server or HPC training cluster, this can be used to repeatedly abort other teams' training runs — effective sabotage of ML pipelines without leaving obvious traces beyond process crash logs.