CVE-2021-29538: TensorFlow: div-by-zero DoS in Conv2DBackpropFilter
MEDIUM PoC AVAILABLEA trivially exploitable division-by-zero in TensorFlow's Conv2DBackpropFilter lets any authenticated user crash training jobs by submitting empty tensor shapes — no special skills required. Upgrade to TF 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4 immediately if running shared ML training infrastructure. Risk is elevated in multi-tenant platforms where users control model architecture inputs.
Risk Assessment
Medium severity overall, but contextually critical for shared ML training environments. The local attack vector (AV:L) limits remote exploitation in isolation, but cloud-based ML training services, internal Jupyter environments, and AutoML platforms make 'local' access trivially achievable by any authenticated user. Zero complexity to exploit and no special skills required. Impact is confined to availability — no data exposure or code execution — but repeated abuse can drain compute budgets and disrupt CI/CD ML pipelines.
Affected Systems
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| tensorflow | pip | — | No patch |
Do you use tensorflow? You're affected.
Severity & Risk
Attack Surface
Recommended Action
5 steps-
Patch: Upgrade TensorFlow to 2.5.0+ or apply available backport patches (2.4.2, 2.3.3, 2.2.3, 2.1.4).
-
Input validation: Enforce tensor shape validation before invoking Conv2D operations — reject empty or zero-dimension tensor shapes at ingestion boundaries.
-
Isolation: Run each training job in isolated containers or VMs to limit blast radius of induced crashes.
-
Detection: Alert on runtime exceptions in training workers matching division-by-zero patterns in conv_grad_filter_ops.cc.
-
Audit: Review any training endpoints accepting user-defined model architectures for unvalidated shape inputs.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2021-29538?
A trivially exploitable division-by-zero in TensorFlow's Conv2DBackpropFilter lets any authenticated user crash training jobs by submitting empty tensor shapes — no special skills required. Upgrade to TF 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4 immediately if running shared ML training infrastructure. Risk is elevated in multi-tenant platforms where users control model architecture inputs.
Is CVE-2021-29538 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2021-29538, increasing the risk of exploitation.
How to fix CVE-2021-29538?
1. Patch: Upgrade TensorFlow to 2.5.0+ or apply available backport patches (2.4.2, 2.3.3, 2.2.3, 2.1.4). 2. Input validation: Enforce tensor shape validation before invoking Conv2D operations — reject empty or zero-dimension tensor shapes at ingestion boundaries. 3. Isolation: Run each training job in isolated containers or VMs to limit blast radius of induced crashes. 4. Detection: Alert on runtime exceptions in training workers matching division-by-zero patterns in conv_grad_filter_ops.cc. 5. Audit: Review any training endpoints accepting user-defined model architectures for unvalidated shape inputs.
What systems are affected by CVE-2021-29538?
This vulnerability affects the following AI/ML architecture patterns: training pipelines, shared ML platforms, AutoML systems, model fine-tuning infrastructure, CI/CD ML pipelines.
What is the CVSS score for CVE-2021-29538?
CVE-2021-29538 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.03%.
Technical Details
NVD Description
TensorFlow is an end-to-end open source platform for machine learning. An attacker can cause a division by zero to occur in `Conv2DBackpropFilter`. This is because the implementation(https://github.com/tensorflow/tensorflow/blob/1b0296c3b8dd9bd948f924aa8cd62f87dbb7c3da/tensorflow/core/kernels/conv_grad_filter_ops.cc#L513-L522) computes a divisor based on user provided data (i.e., the shape of the tensors given as arguments). If all shapes are empty then `work_unit_size` is 0. Since there is no check for this case before division, this results in a runtime exception, with potential to be abused for a denial of service. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.
Exploitation Scenario
An attacker with access to a shared ML training platform — such as an internal Jupyter notebook server, an MLflow tracking server with job submission, or a cloud AutoML API — submits a training job containing a Conv2D layer configured with empty tensor shapes. During backpropagation, TensorFlow computes work_unit_size as 0 and immediately triggers a division-by-zero exception in Conv2DBackpropFilter, crashing the training worker process. The attacker can repeat this in a loop to continuously disrupt co-tenant training jobs or exhaust training worker capacity, effectively holding the ML training infrastructure in a degraded state at minimal cost.
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H References
- github.com/tensorflow/tensorflow/commit/c570e2ecfc822941335ad48f6e10df4e21f11c96 Patch 3rd Party
- github.com/tensorflow/tensorflow/security/advisories/GHSA-j8qc-5fqr-52fp Exploit Patch 3rd Party
Timeline
Related Vulnerabilities
CVE-2020-15196 9.9 TensorFlow: heap OOB read in sparse/ragged count ops
Same package: tensorflow CVE-2020-15205 9.8 TensorFlow: heap overflow in StringNGrams, ASLR bypass
Same package: tensorflow CVE-2020-15208 9.8 TFLite: OOB read/write via tensor dimension mismatch
Same package: tensorflow CVE-2019-16778 9.8 TensorFlow: heap overflow in UnsortedSegmentSum op
Same package: tensorflow CVE-2022-23587 9.8 TensorFlow: integer overflow in Grappler enables RCE
Same package: tensorflow
AI Threat Alert