CVE-2021-29532: TensorFlow: heap OOB read via RaggedCross op

HIGH PoC AVAILABLE
Published May 14, 2021
CISO Take

TensorFlow's RaggedCross kernel operation lacks bounds validation on user-supplied tensor indices, enabling heap out-of-bounds reads that can disclose adjacent process memory or crash ML workloads. Any environment where untrusted users submit TensorFlow operations — shared Jupyter clusters, TF Serving endpoints — is the primary exposure surface. Patch immediately to TF 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4.

What is the risk?

Moderate-to-high risk for multi-tenant ML infrastructure. CVSS 7.1 with local attack vector and low-privilege requirement means shared training clusters and notebook environments are the primary blast radius. Confidentiality impact is high (heap memory disclosure) and availability impact is high (process crash). Not in CISA KEV with no reported active exploitation, reducing immediate urgency, but low attack complexity makes this trivially weaponizable by any authorized platform user.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 2d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
7.1 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 10% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Moderate
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C High
I None
A High

What should I do?

5 steps
  1. Patch: Upgrade to TensorFlow 2.5.0, or apply cherry-picked fixes in 2.4.2, 2.3.3, 2.2.3, or 2.1.4.

  2. Workaround: Restrict access to tf.raw_ops.RaggedCross via TF op allowlisting in serving environments if patching is delayed.

  3. Network controls: Do not expose raw TF op execution to untrusted clients; restrict TF Serving endpoints to authenticated internal consumers only.

  4. Detection: Monitor for abnormal process crashes or heap corruption signals (SIGSEGV, abort traps) in TF workloads.

  5. Container isolation: Run TF workloads in isolated containers per-user or per-tenant to limit memory disclosure blast radius.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 9 - Risk management system
ISO 42001
A.6.2.5 - AI system testing and validation
NIST AI RMF
MANAGE 2.2 - Mechanisms to sustain the value of deployed AI systems

Frequently Asked Questions

What is CVE-2021-29532?

TensorFlow's RaggedCross kernel operation lacks bounds validation on user-supplied tensor indices, enabling heap out-of-bounds reads that can disclose adjacent process memory or crash ML workloads. Any environment where untrusted users submit TensorFlow operations — shared Jupyter clusters, TF Serving endpoints — is the primary exposure surface. Patch immediately to TF 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4.

Is CVE-2021-29532 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29532, increasing the risk of exploitation.

How to fix CVE-2021-29532?

1. Patch: Upgrade to TensorFlow 2.5.0, or apply cherry-picked fixes in 2.4.2, 2.3.3, 2.2.3, or 2.1.4. 2. Workaround: Restrict access to tf.raw_ops.RaggedCross via TF op allowlisting in serving environments if patching is delayed. 3. Network controls: Do not expose raw TF op execution to untrusted clients; restrict TF Serving endpoints to authenticated internal consumers only. 4. Detection: Monitor for abnormal process crashes or heap corruption signals (SIGSEGV, abort traps) in TF workloads. 5. Container isolation: Run TF workloads in isolated containers per-user or per-tenant to limit memory disclosure blast radius.

What systems are affected by CVE-2021-29532?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, shared ML notebook platforms.

What is the CVSS score for CVE-2021-29532?

CVE-2021-29532 has a CVSS v3.1 base score of 7.1 (HIGH). The EPSS exploitation probability is 0.20%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingshared ML notebook platforms

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0040 AI Model Inference API Access
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 9
ISO 42001: A.6.2.5
NIST AI RMF: MANAGE 2.2

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. An attacker can force accesses outside the bounds of heap allocated arrays by passing in invalid tensor values to `tf.raw_ops.RaggedCross`. This is because the implementation(https://github.com/tensorflow/tensorflow/blob/efea03b38fb8d3b81762237dc85e579cc5fc6e87/tensorflow/core/kernels/ragged_cross_op.cc#L456-L487) lacks validation for the user supplied arguments. Each of the above branches call a helper function after accessing array elements via a `*_list[next_*]` pattern, followed by incrementing the `next_*` index. However, as there is no validation that the `next_*` values are in the valid range for the corresponding `*_list` arrays, this results in heap OOB reads. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with low-privilege access to a shared ML training platform (e.g., corporate JupyterHub or ML experimentation environment) submits a crafted notebook calling tf.raw_ops.RaggedCross with deliberately invalid tensor index values. Missing bounds validation on next_* counters causes heap OOB reads into adjacent memory — potentially leaking model weights, training batch data, or authentication tokens from co-tenant sessions sharing the same Python process. In a TF Serving deployment, a client can send crafted gRPC inference requests with malformed RaggedTensor payloads to repeatedly crash the serving process, causing sustained availability impact on ML-powered downstream applications.

Weaknesses (CWE)

CWE-125 — Out-of-bounds Read: The product reads data past the end, or before the beginning, of the intended buffer.

  • [Implementation] Assume all input is malicious. Use an "accept known good" input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does. When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, "boat" may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as "red" or "blue." Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code's environment changes. This can give attackers enough room to bypass the intended validation. However, denylis
  • [Architecture and Design] Use a language that provides appropriate memory abstractions.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:N/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities