CVE-2021-29608: TensorFlow: heap OOB in RaggedTensorToTensor op

HIGH PoC AVAILABLE
Published May 14, 2021
CISO Take

Any TensorFlow deployment below 2.5.0 (or unpatched 2.1.x–2.4.x) is vulnerable to heap out-of-bounds access via malformed ragged tensor inputs, enabling local privilege escalation to full system compromise. Patch to TF 2.5.0 or the respective cherrypick releases (2.1.4, 2.2.3, 2.3.3, 2.4.2) immediately. Prioritize ML training clusters and multi-tenant inference servers where low-privileged users can submit ops.

What is the risk?

High risk for shared or multi-tenant ML infrastructure. CVSS 7.8 with local, low-complexity, low-privilege vector means any authenticated user on a shared training node or Jupyter environment can exploit this. DCHECK guards are compiled out in release builds, removing the only defensive layer. No active KEV listing reduces urgency for internet-exposed systems, but internal threat actors or compromised ML user accounts pose a credible path to host takeover.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 2d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
7.8 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 14% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Moderate
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C High
I High
A High

What should I do?

5 steps
  1. Patch: Upgrade to TensorFlow 2.5.0 or backport releases 2.1.4/2.2.3/2.3.3/2.4.2.

  2. Workaround: Restrict access to tf.raw_ops.RaggedTensorToTensor via op allowlisting if running custom serving infrastructure.

  3. Network isolation: Ensure TF Serving endpoints are not directly reachable by untrusted users.

  4. Detection: Audit for anomalous process spawning or privilege escalation events on ML training hosts; monitor for empty-tensor inputs passed to RaggedTensor ops in serving logs.

  5. Inventory: Scan all ML environments (notebooks, CI/CD pipelines, serving containers) for vulnerable TF versions using package managers or SBOM tooling.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 9 - Risk management system
ISO 42001
A.7.4 - Third-party and external AI components risk management
NIST AI RMF
GOVERN-6.2 - Policies for third-party AI risk MANAGE-2.2 - Mechanisms for sustaining AI system integrity across lifecycle
OWASP LLM Top 10
LLM05:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-29608?

Any TensorFlow deployment below 2.5.0 (or unpatched 2.1.x–2.4.x) is vulnerable to heap out-of-bounds access via malformed ragged tensor inputs, enabling local privilege escalation to full system compromise. Patch to TF 2.5.0 or the respective cherrypick releases (2.1.4, 2.2.3, 2.3.3, 2.4.2) immediately. Prioritize ML training clusters and multi-tenant inference servers where low-privileged users can submit ops.

Is CVE-2021-29608 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29608, increasing the risk of exploitation.

How to fix CVE-2021-29608?

1. Patch: Upgrade to TensorFlow 2.5.0 or backport releases 2.1.4/2.2.3/2.3.3/2.4.2. 2. Workaround: Restrict access to tf.raw_ops.RaggedTensorToTensor via op allowlisting if running custom serving infrastructure. 3. Network isolation: Ensure TF Serving endpoints are not directly reachable by untrusted users. 4. Detection: Audit for anomalous process spawning or privilege escalation events on ML training hosts; monitor for empty-tensor inputs passed to RaggedTensor ops in serving logs. 5. Inventory: Scan all ML environments (notebooks, CI/CD pipelines, serving containers) for vulnerable TF versions using package managers or SBOM tooling.

What systems are affected by CVE-2021-29608?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, notebook environments, data preprocessing pipelines.

What is the CVSS score for CVE-2021-29608?

CVE-2021-29608 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.23%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingnotebook environmentsdata preprocessing pipelines

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0037 Data from Local System
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 9
ISO 42001: A.7.4
NIST AI RMF: GOVERN-6.2, MANAGE-2.2
OWASP LLM Top 10: LLM05:2025

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. Due to lack of validation in `tf.raw_ops.RaggedTensorToTensor`, an attacker can exploit an undefined behavior if input arguments are empty. The implementation(https://github.com/tensorflow/tensorflow/blob/656e7673b14acd7835dc778867f84916c6d1cac2/tensorflow/core/kernels/ragged_tensor_to_tensor_op.cc#L356-L360) only checks that one of the tensors is not empty, but does not check for the other ones. There are multiple `DCHECK` validations to prevent heap OOB, but these are no-op in release builds, hence they don't prevent anything. The fix will be included in TensorFlow 2.5.0. We will also cherrypick these commits on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with a low-privileged account on a shared ML training cluster submits a TensorFlow job containing a crafted call to tf.raw_ops.RaggedTensorToTensor with an intentionally empty input tensor. The missing validation in release builds skips the DCHECK guards, triggering undefined behavior and heap OOB access. On a vulnerable host, this translates to a controlled memory corruption primitive, enabling the attacker to overwrite adjacent heap structures and escalate to the privileges of the TensorFlow process—often a service account with access to training data, model artifacts, and cloud credentials stored in environment variables.

Weaknesses (CWE)

CWE-131 — Incorrect Calculation of Buffer Size: The product does not correctly calculate the size to be used when allocating a buffer, which could lead to a buffer overflow.

  • [Implementation] When allocating a buffer for the purpose of transforming, converting, or encoding an input, allocate enough memory to handle the largest possible encoding. For example, in a routine that converts "&" characters to "&" for HTML entity encoding, the output buffer needs to be at least 5 times as large as the input buffer.
  • [Implementation] Understand the programming language's underlying representation and how it interacts with numeric calculation (CWE-681). Pay close attention to byte size discrepancies, precision, signed/unsigned distinctions, truncation, conversion and casting between types, "not-a-number" calculations, and how the language handles numbers that are too large or too small for its underlying representation. [REF-7] Also be careful to account for 32-bit, 64-bit, and other potential differences that may affect the numeric representation.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities