CVE-2021-29538: TensorFlow: div-by-zero DoS in Conv2DBackpropFilter

MEDIUM PoC AVAILABLE
Published May 14, 2021
CISO Take

A trivially exploitable division-by-zero in TensorFlow's Conv2DBackpropFilter lets any authenticated user crash training jobs by submitting empty tensor shapes — no special skills required. Upgrade to TF 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4 immediately if running shared ML training infrastructure. Risk is elevated in multi-tenant platforms where users control model architecture inputs.

Risk Assessment

Medium severity overall, but contextually critical for shared ML training environments. The local attack vector (AV:L) limits remote exploitation in isolation, but cloud-based ML training services, internal Jupyter environments, and AutoML platforms make 'local' access trivially achievable by any authenticated user. Zero complexity to exploit and no special skills required. Impact is confined to availability — no data exposure or code execution — but repeated abuse can drain compute budgets and disrupt CI/CD ML pipelines.

Affected Systems

Package Ecosystem Vulnerable Range Patched
tensorflow pip No patch
195.0K OpenSSF 7.2 3.7K dependents Pushed today 4% patched ~1372d to patch Full package profile →

Do you use tensorflow? You're affected.

Severity & Risk

CVSS 3.1
5.5 / 10
EPSS
0.0%
chance of exploitation in 30 days
Higher than 8% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

Recommended Action

5 steps
  1. Patch: Upgrade TensorFlow to 2.5.0+ or apply available backport patches (2.4.2, 2.3.3, 2.2.3, 2.1.4).

  2. Input validation: Enforce tensor shape validation before invoking Conv2D operations — reject empty or zero-dimension tensor shapes at ingestion boundaries.

  3. Isolation: Run each training job in isolated containers or VMs to limit blast radius of induced crashes.

  4. Detection: Alert on runtime exceptions in training workers matching division-by-zero patterns in conv_grad_filter_ops.cc.

  5. Audit: Review any training endpoints accepting user-defined model architectures for unvalidated shape inputs.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
Clause 8.4 - AI system operation and monitoring
NIST AI RMF
MANAGE-2.4 - Residual risks from AI systems are monitored and managed MAP-5.1 - Likelihood of AI risks is estimated as part of the risk assessment

Frequently Asked Questions

What is CVE-2021-29538?

A trivially exploitable division-by-zero in TensorFlow's Conv2DBackpropFilter lets any authenticated user crash training jobs by submitting empty tensor shapes — no special skills required. Upgrade to TF 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4 immediately if running shared ML training infrastructure. Risk is elevated in multi-tenant platforms where users control model architecture inputs.

Is CVE-2021-29538 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29538, increasing the risk of exploitation.

How to fix CVE-2021-29538?

1. Patch: Upgrade TensorFlow to 2.5.0+ or apply available backport patches (2.4.2, 2.3.3, 2.2.3, 2.1.4). 2. Input validation: Enforce tensor shape validation before invoking Conv2D operations — reject empty or zero-dimension tensor shapes at ingestion boundaries. 3. Isolation: Run each training job in isolated containers or VMs to limit blast radius of induced crashes. 4. Detection: Alert on runtime exceptions in training workers matching division-by-zero patterns in conv_grad_filter_ops.cc. 5. Audit: Review any training endpoints accepting user-defined model architectures for unvalidated shape inputs.

What systems are affected by CVE-2021-29538?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, shared ML platforms, AutoML systems, model fine-tuning infrastructure, CI/CD ML pipelines.

What is the CVSS score for CVE-2021-29538?

CVE-2021-29538 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.03%.

Technical Details

NVD Description

TensorFlow is an end-to-end open source platform for machine learning. An attacker can cause a division by zero to occur in `Conv2DBackpropFilter`. This is because the implementation(https://github.com/tensorflow/tensorflow/blob/1b0296c3b8dd9bd948f924aa8cd62f87dbb7c3da/tensorflow/core/kernels/conv_grad_filter_ops.cc#L513-L522) computes a divisor based on user provided data (i.e., the shape of the tensors given as arguments). If all shapes are empty then `work_unit_size` is 0. Since there is no check for this case before division, this results in a runtime exception, with potential to be abused for a denial of service. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An attacker with access to a shared ML training platform — such as an internal Jupyter notebook server, an MLflow tracking server with job submission, or a cloud AutoML API — submits a training job containing a Conv2D layer configured with empty tensor shapes. During backpropagation, TensorFlow computes work_unit_size as 0 and immediately triggers a division-by-zero exception in Conv2DBackpropFilter, crashing the training worker process. The attacker can repeat this in a loop to continuously disrupt co-tenant training jobs or exhaust training worker capacity, effectively holding the ML training infrastructure in a degraded state at minimal cost.

Weaknesses (CWE)

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities