CVE-2021-29538: TensorFlow div-by-zero DoS

CISO Take

A trivially exploitable division-by-zero in TensorFlow's Conv2DBackpropFilter lets any authenticated user crash training jobs by submitting empty tensor shapes — no special skills required. Upgrade to TF 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4 immediately if running shared ML training infrastructure. Risk is elevated in multi-tenant platforms where users control model architecture inputs.

What is the risk?

Medium severity overall, but contextually critical for shared ML training environments. The local attack vector (AV:L) limits remote exploitation in isolation, but cloud-based ML training services, internal Jupyter environments, and AutoML platforms make 'local' access trivially achievable by any authenticated user. Zero complexity to exploit and no special skills required. Impact is confined to availability — no data exposure or code execution — but repeated abuse can drain compute budgets and disrupt CI/CD ML pipelines.

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
TensorFlow	pip	—	No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1

5.5 / 10

EPSS

0.2%

chance of exploitation in 30 days

Higher than 9% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Trivial

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV Local

AC Low

PR Low

UI None

S Unchanged

C None

I None

A High

What should I do?

5 steps

Patch: Upgrade TensorFlow to 2.5.0+ or apply available backport patches (2.4.2, 2.3.3, 2.2.3, 2.1.4).
Input validation: Enforce tensor shape validation before invoking Conv2D operations — reject empty or zero-dimension tensor shapes at ingestion boundaries.
Isolation: Run each training job in isolated containers or VMs to limit blast radius of induced crashes.
Detection: Alert on runtime exceptions in training workers matching division-by-zero patterns in conv_grad_filter_ops.cc.
Audit: Review any training endpoints accepting user-defined model architectures for unvalidated shape inputs.

How is it classified?

DoS Framework AML.T0010.001 - AI Software AML.T0029 - Denial of AI Service AML.T0034 - Cost Harvesting AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

Clause 8.4 - AI system operation and monitoring

NIST AI RMF

MANAGE-2.4 - Residual risks from AI systems are monitored and managed MAP-5.1 - Likelihood of AI risks is estimated as part of the risk assessment

Frequently Asked Questions

What is CVE-2021-29538?

A trivially exploitable division-by-zero in TensorFlow's Conv2DBackpropFilter lets any authenticated user crash training jobs by submitting empty tensor shapes — no special skills required. Upgrade to TF 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4 immediately if running shared ML training infrastructure. Risk is elevated in multi-tenant platforms where users control model architecture inputs.

Is CVE-2021-29538 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29538, increasing the risk of exploitation.

How to fix CVE-2021-29538?

1. Patch: Upgrade TensorFlow to 2.5.0+ or apply available backport patches (2.4.2, 2.3.3, 2.2.3, 2.1.4). 2. Input validation: Enforce tensor shape validation before invoking Conv2D operations — reject empty or zero-dimension tensor shapes at ingestion boundaries. 3. Isolation: Run each training job in isolated containers or VMs to limit blast radius of induced crashes. 4. Detection: Alert on runtime exceptions in training workers matching division-by-zero patterns in conv_grad_filter_ops.cc. 5. Audit: Review any training endpoints accepting user-defined model architectures for unvalidated shape inputs.

What systems are affected by CVE-2021-29538?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, shared ML platforms, AutoML systems, model fine-tuning infrastructure, CI/CD ML pipelines.

What is the CVSS score for CVE-2021-29538?

CVE-2021-29538 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.19%.

What is the AI security impact?

Affected AI Architectures

training pipelinesshared ML platformsAutoML systemsmodel fine-tuning infrastructureCI/CD ML pipelines

MITRE ATLAS Techniques

AML.T0010.001 AI Software

AML.T0029 Denial of AI Service

AML.T0034 Cost Harvesting

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15

ISO 42001: Clause 8.4

NIST AI RMF: MANAGE-2.4, MAP-5.1

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. An attacker can cause a division by zero to occur in `Conv2DBackpropFilter`. This is because the implementation(https://github.com/tensorflow/tensorflow/blob/1b0296c3b8dd9bd948f924aa8cd62f87dbb7c3da/tensorflow/core/kernels/conv_grad_filter_ops.cc#L513-L522) computes a divisor based on user provided data (i.e., the shape of the tensors given as arguments). If all shapes are empty then `work_unit_size` is 0. Since there is no check for this case before division, this results in a runtime exception, with potential to be abused for a denial of service. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An attacker with access to a shared ML training platform — such as an internal Jupyter notebook server, an MLflow tracking server with job submission, or a cloud AutoML API — submits a training job containing a Conv2D layer configured with empty tensor shapes. During backpropagation, TensorFlow computes work_unit_size as 0 and immediately triggers a division-by-zero exception in Conv2DBackpropFilter, crashing the training worker process. The attacker can repeat this in a loop to continuously disrupt co-tenant training jobs or exhaust training worker capacity, effectively holding the ML training infrastructure in a degraded state at minimal cost.