CVE-2021-29544: TensorFlow: DoS via missing tensor rank validation

MEDIUM PoC AVAILABLE
Published May 14, 2021
CISO Take

A local attacker can crash TensorFlow processes by passing tensors with invalid rank to the QuantizeAndDequantizeV4Grad op, triggering a CHECK-fail abort in the C++ runtime. Exploitability is limited to local access, making this most dangerous in shared ML compute environments such as multi-tenant Jupyter servers or GPU clusters where untrusted users can submit jobs. Patch to TensorFlow 2.4.2 or 2.5.0 — no workaround exists beyond input sanitization at the application layer.

What is the risk?

Medium risk overall, but highly context-dependent. In isolated single-user training environments the blast radius is minimal and the threat is largely theoretical. Risk escalates substantially in multi-tenant ML platforms where untrusted users can submit training or inference jobs, since a single malformed tensor call can crash the entire TF process and disrupt co-located workloads. No remote exploitation vector exists per the CVSS (AV:L), which limits exposure compared to network-reachable vulnerabilities.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 2d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
5.5 / 10
EPSS
0.3%
chance of exploitation in 30 days
Higher than 22% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

What should I do?

5 steps
  1. Upgrade TensorFlow to 2.4.2 (cherry-picked backport) or 2.5.0+.

  2. If immediate patching is blocked, enforce input tensor shape validation at the application boundary before tensors reach raw TF ops.

  3. Implement process supervision (systemd, supervisord, Kubernetes restartPolicy) for TF serving processes to auto-recover from crashes.

  4. Audit multi-tenant ML platforms for user isolation — restrict who can invoke tf.raw_ops directly and enforce job sandboxing.

  5. Monitor for unexpected TF process crashes in serving infrastructure as a detection signal.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.1 - AI system design and robustness
NIST AI RMF
MANAGE 2.2 - Mechanisms are in place to sustain AI system value and manage AI risks
OWASP LLM Top 10
LLM10:2025 - Unbounded Consumption

Frequently Asked Questions

What is CVE-2021-29544?

A local attacker can crash TensorFlow processes by passing tensors with invalid rank to the QuantizeAndDequantizeV4Grad op, triggering a CHECK-fail abort in the C++ runtime. Exploitability is limited to local access, making this most dangerous in shared ML compute environments such as multi-tenant Jupyter servers or GPU clusters where untrusted users can submit jobs. Patch to TensorFlow 2.4.2 or 2.5.0 — no workaround exists beyond input sanitization at the application layer.

Is CVE-2021-29544 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29544, increasing the risk of exploitation.

How to fix CVE-2021-29544?

1. Upgrade TensorFlow to 2.4.2 (cherry-picked backport) or 2.5.0+. 2. If immediate patching is blocked, enforce input tensor shape validation at the application boundary before tensors reach raw TF ops. 3. Implement process supervision (systemd, supervisord, Kubernetes restartPolicy) for TF serving processes to auto-recover from crashes. 4. Audit multi-tenant ML platforms for user isolation — restrict who can invoke tf.raw_ops directly and enforce job sandboxing. 5. Monitor for unexpected TF process crashes in serving infrastructure as a detection signal.

What systems are affected by CVE-2021-29544?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, edge/mobile deployment pipelines.

What is the CVSS score for CVE-2021-29544?

CVE-2021-29544 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.31%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingedge/mobile deployment pipelines

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0029 Denial of AI Service

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: A.6.1
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM10:2025

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. An attacker can trigger a denial of service via a `CHECK`-fail in `tf.raw_ops.QuantizeAndDequantizeV4Grad`. This is because the implementation does not validate the rank of the `input_*` tensors. In turn, this results in the tensors being passes as they are to `QuantizeAndDequantizePerChannelGradientImpl`. However, the `vec<T>` method, requires the rank to 1 and triggers a `CHECK` failure otherwise. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2 as this is the only other affected version.

Exploitation Scenario

An attacker with local access to a shared ML compute node — such as a data scientist account on a multi-tenant Jupyter server — writes a script calling tf.raw_ops.QuantizeAndDequantizeV4Grad with input tensors of rank ≠ 1. The TensorFlow C++ runtime's vec<T>() method expects rank 1, triggers a CHECK failure, and aborts the entire TF process. In a shared inference server environment, this takes down all concurrent inference requests. In a training cluster without job isolation, the crash can disrupt other users' active training runs and corrupt unsaved checkpoints.

Weaknesses (CWE)

CWE-754 — Improper Check for Unusual or Exceptional Conditions: The product does not check or incorrectly checks for unusual or exceptional conditions that are not expected to occur frequently during day to day operation of the product.

  • [Requirements] Use a language that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. Choose languages with features such as exception handling that force the programmer to anticipate unusual conditions that may generate exceptions. Custom exceptions may need to be developed to handle unusual business-logic conditions. Be careful not to pass sensitive exceptions back to the user (CWE-209, CWE-248).
  • [Implementation] Check the results of all functions that return a value and verify that the value is expected.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities