CVE-2021-29544: TensorFlow: DoS via missing tensor rank validation
MEDIUM PoC AVAILABLEA local attacker can crash TensorFlow processes by passing tensors with invalid rank to the QuantizeAndDequantizeV4Grad op, triggering a CHECK-fail abort in the C++ runtime. Exploitability is limited to local access, making this most dangerous in shared ML compute environments such as multi-tenant Jupyter servers or GPU clusters where untrusted users can submit jobs. Patch to TensorFlow 2.4.2 or 2.5.0 — no workaround exists beyond input sanitization at the application layer.
What is the risk?
Medium risk overall, but highly context-dependent. In isolated single-user training environments the blast radius is minimal and the threat is largely theoretical. Risk escalates substantially in multi-tenant ML platforms where untrusted users can submit training or inference jobs, since a single malformed tensor call can crash the entire TF process and disrupt co-located workloads. No remote exploitation vector exists per the CVSS (AV:L), which limits exposure compared to network-reachable vulnerabilities.
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| TensorFlow | pip | — | No patch |
Do you use TensorFlow? You're affected.
How severe is it?
What is the attack surface?
What should I do?
5 steps-
Upgrade TensorFlow to 2.4.2 (cherry-picked backport) or 2.5.0+.
-
If immediate patching is blocked, enforce input tensor shape validation at the application boundary before tensors reach raw TF ops.
-
Implement process supervision (systemd, supervisord, Kubernetes restartPolicy) for TF serving processes to auto-recover from crashes.
-
Audit multi-tenant ML platforms for user isolation — restrict who can invoke tf.raw_ops directly and enforce job sandboxing.
-
Monitor for unexpected TF process crashes in serving infrastructure as a detection signal.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2021-29544?
A local attacker can crash TensorFlow processes by passing tensors with invalid rank to the QuantizeAndDequantizeV4Grad op, triggering a CHECK-fail abort in the C++ runtime. Exploitability is limited to local access, making this most dangerous in shared ML compute environments such as multi-tenant Jupyter servers or GPU clusters where untrusted users can submit jobs. Patch to TensorFlow 2.4.2 or 2.5.0 — no workaround exists beyond input sanitization at the application layer.
Is CVE-2021-29544 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2021-29544, increasing the risk of exploitation.
How to fix CVE-2021-29544?
1. Upgrade TensorFlow to 2.4.2 (cherry-picked backport) or 2.5.0+. 2. If immediate patching is blocked, enforce input tensor shape validation at the application boundary before tensors reach raw TF ops. 3. Implement process supervision (systemd, supervisord, Kubernetes restartPolicy) for TF serving processes to auto-recover from crashes. 4. Audit multi-tenant ML platforms for user isolation — restrict who can invoke tf.raw_ops directly and enforce job sandboxing. 5. Monitor for unexpected TF process crashes in serving infrastructure as a detection signal.
What systems are affected by CVE-2021-29544?
This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, edge/mobile deployment pipelines.
What is the CVSS score for CVE-2021-29544?
CVE-2021-29544 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.31%.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0010.001 AI Software AML.T0029 Denial of AI Service Compliance Controls Affected
What are the technical details?
Original Advisory
TensorFlow is an end-to-end open source platform for machine learning. An attacker can trigger a denial of service via a `CHECK`-fail in `tf.raw_ops.QuantizeAndDequantizeV4Grad`. This is because the implementation does not validate the rank of the `input_*` tensors. In turn, this results in the tensors being passes as they are to `QuantizeAndDequantizePerChannelGradientImpl`. However, the `vec<T>` method, requires the rank to 1 and triggers a `CHECK` failure otherwise. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2 as this is the only other affected version.
Exploitation Scenario
An attacker with local access to a shared ML compute node — such as a data scientist account on a multi-tenant Jupyter server — writes a script calling tf.raw_ops.QuantizeAndDequantizeV4Grad with input tensors of rank ≠ 1. The TensorFlow C++ runtime's vec<T>() method expects rank 1, triggers a CHECK failure, and aborts the entire TF process. In a shared inference server environment, this takes down all concurrent inference requests. In a training cluster without job isolation, the crash can disrupt other users' active training runs and corrupt unsaved checkpoints.
Weaknesses (CWE)
CWE-754 — Improper Check for Unusual or Exceptional Conditions: The product does not check or incorrectly checks for unusual or exceptional conditions that are not expected to occur frequently during day to day operation of the product.
- [Requirements] Use a language that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. Choose languages with features such as exception handling that force the programmer to anticipate unusual conditions that may generate exceptions. Custom exceptions may need to be developed to handle unusual business-logic conditions. Be careful not to pass sensitive exceptions back to the user (CWE-209, CWE-248).
- [Implementation] Check the results of all functions that return a value and verify that the value is expected.
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H References
- github.com/tensorflow/tensorflow/blob/95078c145b5a7a43ee046144005f733092756ab5/tensorflow/core/kernels/quantize_and_dequantize_op.cc
- github.com/tensorflow/tensorflow/blob/95078c145b5a7a43ee046144005f733092756ab5/tensorflow/core/kernels/quantize_and_dequantize_op.h
- github.com/tensorflow/tensorflow/commit/20431e9044cf2ad3c0323c34888b192f3289af6b Patch 3rd Party
- github.com/tensorflow/tensorflow/security/advisories/GHSA-6g85-3hm8-83f9 Exploit Patch 3rd Party
Timeline
Related Vulnerabilities
CVE-2020-15196 9.9 TensorFlow: heap OOB read in sparse/ragged count ops
Same package: tensorflow CVE-2020-15205 9.8 TensorFlow: heap overflow in StringNGrams, ASLR bypass
Same package: tensorflow CVE-2020-15208 9.8 TFLite: OOB read/write via tensor dimension mismatch
Same package: tensorflow CVE-2019-16778 9.8 TensorFlow: heap overflow in UnsortedSegmentSum op
Same package: tensorflow CVE-2022-23587 9.8 TensorFlow: integer overflow in Grappler enables RCE
Same package: tensorflow