CVE-2021-29544: TensorFlow: DoS via missing tensor rank validation

MEDIUM PoC AVAILABLE
Published May 14, 2021
CISO Take

A local attacker can crash TensorFlow processes by passing tensors with invalid rank to the QuantizeAndDequantizeV4Grad op, triggering a CHECK-fail abort in the C++ runtime. Exploitability is limited to local access, making this most dangerous in shared ML compute environments such as multi-tenant Jupyter servers or GPU clusters where untrusted users can submit jobs. Patch to TensorFlow 2.4.2 or 2.5.0 — no workaround exists beyond input sanitization at the application layer.

Risk Assessment

Medium risk overall, but highly context-dependent. In isolated single-user training environments the blast radius is minimal and the threat is largely theoretical. Risk escalates substantially in multi-tenant ML platforms where untrusted users can submit training or inference jobs, since a single malformed tensor call can crash the entire TF process and disrupt co-located workloads. No remote exploitation vector exists per the CVSS (AV:L), which limits exposure compared to network-reachable vulnerabilities.

Affected Systems

Package Ecosystem Vulnerable Range Patched
tensorflow pip No patch
195.0K OpenSSF 7.2 3.7K dependents Pushed 6d ago 4% patched ~1372d to patch Full package profile →

Do you use tensorflow? You're affected.

Severity & Risk

CVSS 3.1
5.5 / 10
EPSS
0.0%
chance of exploitation in 30 days
Higher than 8% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

Recommended Action

5 steps
  1. Upgrade TensorFlow to 2.4.2 (cherry-picked backport) or 2.5.0+.

  2. If immediate patching is blocked, enforce input tensor shape validation at the application boundary before tensors reach raw TF ops.

  3. Implement process supervision (systemd, supervisord, Kubernetes restartPolicy) for TF serving processes to auto-recover from crashes.

  4. Audit multi-tenant ML platforms for user isolation — restrict who can invoke tf.raw_ops directly and enforce job sandboxing.

  5. Monitor for unexpected TF process crashes in serving infrastructure as a detection signal.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.1 - AI system design and robustness
NIST AI RMF
MANAGE 2.2 - Mechanisms are in place to sustain AI system value and manage AI risks
OWASP LLM Top 10
LLM10:2025 - Unbounded Consumption

Frequently Asked Questions

What is CVE-2021-29544?

A local attacker can crash TensorFlow processes by passing tensors with invalid rank to the QuantizeAndDequantizeV4Grad op, triggering a CHECK-fail abort in the C++ runtime. Exploitability is limited to local access, making this most dangerous in shared ML compute environments such as multi-tenant Jupyter servers or GPU clusters where untrusted users can submit jobs. Patch to TensorFlow 2.4.2 or 2.5.0 — no workaround exists beyond input sanitization at the application layer.

Is CVE-2021-29544 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29544, increasing the risk of exploitation.

How to fix CVE-2021-29544?

1. Upgrade TensorFlow to 2.4.2 (cherry-picked backport) or 2.5.0+. 2. If immediate patching is blocked, enforce input tensor shape validation at the application boundary before tensors reach raw TF ops. 3. Implement process supervision (systemd, supervisord, Kubernetes restartPolicy) for TF serving processes to auto-recover from crashes. 4. Audit multi-tenant ML platforms for user isolation — restrict who can invoke tf.raw_ops directly and enforce job sandboxing. 5. Monitor for unexpected TF process crashes in serving infrastructure as a detection signal.

What systems are affected by CVE-2021-29544?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, edge/mobile deployment pipelines.

What is the CVSS score for CVE-2021-29544?

CVE-2021-29544 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.03%.

Technical Details

NVD Description

TensorFlow is an end-to-end open source platform for machine learning. An attacker can trigger a denial of service via a `CHECK`-fail in `tf.raw_ops.QuantizeAndDequantizeV4Grad`. This is because the implementation does not validate the rank of the `input_*` tensors. In turn, this results in the tensors being passes as they are to `QuantizeAndDequantizePerChannelGradientImpl`. However, the `vec<T>` method, requires the rank to 1 and triggers a `CHECK` failure otherwise. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2 as this is the only other affected version.

Exploitation Scenario

An attacker with local access to a shared ML compute node — such as a data scientist account on a multi-tenant Jupyter server — writes a script calling tf.raw_ops.QuantizeAndDequantizeV4Grad with input tensors of rank ≠ 1. The TensorFlow C++ runtime's vec<T>() method expects rank 1, triggers a CHECK failure, and aborts the entire TF process. In a shared inference server environment, this takes down all concurrent inference requests. In a training cluster without job isolation, the crash can disrupt other users' active training runs and corrupt unsaved checkpoints.

Weaknesses (CWE)

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities