CVE-2021-29569: TensorFlow OOB heap read

CISO Take

A heap out-of-bounds read in TensorFlow's MaxPoolGradWithArgmax op allows any local user with low privileges to leak heap memory or crash the TF runtime by passing empty tensors. In shared ML environments — multi-user Jupyter servers, training clusters, model serving endpoints — this is trivially exploitable by any tenant. Patch immediately to TF 2.5.0 or the backported fixes in 2.1.4–2.4.2; there is no workaround short of input validation at the application layer.

What is the risk?

Moderate in isolated single-user environments; elevated in shared ML infrastructure. The local attack vector limits internet-facing exposure, but multi-tenant GPU servers, JupyterHub deployments, and MLOps platforms running shared TF sessions significantly amplify the blast radius. A CVSS of 7.1 reflects high confidentiality impact (heap data leakage) and high availability impact (process crash). Not in CISA KEV and no public exploit weaponization observed, but the primitive is trivial to construct — any user who can call TF ops can trigger it.

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
TensorFlow	pip	—	No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 4d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1

7.1 / 10

EPSS

0.2%

chance of exploitation in 30 days

Higher than 10% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Trivial

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV Local

AC Low

PR Low

UI None

S Unchanged

C High

I None

A High

What should I do?

6 steps

Patch: Upgrade to TensorFlow 2.5.0, or apply backport commits to 2.4.2/2.3.3/2.2.3/2.1.4 (commit ef0c008ee84bad91ec6725ddc42091e19a30cf0e).
Input validation: Enforce tensor shape/element-count checks at API boundaries before ops execute — reject empty tensors for ops requiring at least one element.
Network segmentation: If using TF Serving, restrict access to trusted networks; do not expose raw-op endpoints publicly.
Isolation: Run training jobs in dedicated containers or VMs per user/team to contain blast radius if exploited on shared infrastructure.
Detection: Alert on unexpected SIGSEGV or process crashes from TF worker processes; anomalous crash dumps from training jobs warrant investigation.
Audit: Inventory all TF versions deployed across training, serving, and notebook infrastructure — shadow AI deployments are a common blind spot.

How is it classified?

Data Extraction DoS Framework AML.T0010.001 - AI Software AML.T0043 - Craft Adversarial Data AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Art. 9 - Risk management system

ISO 42001

A.6.2.6 - Information security for AI systems

NIST AI RMF

MANAGE 2.2 - Mechanisms to sustain the value of deployed AI are in place

OWASP LLM Top 10

LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-29569?

A heap out-of-bounds read in TensorFlow's MaxPoolGradWithArgmax op allows any local user with low privileges to leak heap memory or crash the TF runtime by passing empty tensors. In shared ML environments — multi-user Jupyter servers, training clusters, model serving endpoints — this is trivially exploitable by any tenant. Patch immediately to TF 2.5.0 or the backported fixes in 2.1.4–2.4.2; there is no workaround short of input validation at the application layer.

Is CVE-2021-29569 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29569, increasing the risk of exploitation.

How to fix CVE-2021-29569?

1. Patch: Upgrade to TensorFlow 2.5.0, or apply backport commits to 2.4.2/2.3.3/2.2.3/2.1.4 (commit ef0c008ee84bad91ec6725ddc42091e19a30cf0e). 2. Input validation: Enforce tensor shape/element-count checks at API boundaries before ops execute — reject empty tensors for ops requiring at least one element. 3. Network segmentation: If using TF Serving, restrict access to trusted networks; do not expose raw-op endpoints publicly. 4. Isolation: Run training jobs in dedicated containers or VMs per user/team to contain blast radius if exploited on shared infrastructure. 5. Detection: Alert on unexpected SIGSEGV or process crashes from TF worker processes; anomalous crash dumps from training jobs warrant investigation. 6. Audit: Inventory all TF versions deployed across training, serving, and notebook infrastructure — shadow AI deployments are a common blind spot.

What systems are affected by CVE-2021-29569?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, MLOps platforms, shared Jupyter environments.

What is the CVSS score for CVE-2021-29569?

CVE-2021-29569 has a CVSS v3.1 base score of 7.1 (HIGH). The EPSS exploitation probability is 0.20%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingMLOps platformsshared Jupyter environments

MITRE ATLAS Techniques

AML.T0010.001 AI Software

AML.T0043 Craft Adversarial Data

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art. 9

ISO 42001: A.6.2.6

NIST AI RMF: MANAGE 2.2

OWASP LLM Top 10: LLM05

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. The implementation of `tf.raw_ops.MaxPoolGradWithArgmax` can cause reads outside of bounds of heap allocated data if attacker supplies specially crafted inputs. The implementation(https://github.com/tensorflow/tensorflow/blob/ac328eaa3870491ababc147822cd04e91a790643/tensorflow/core/kernels/requantization_range_op.cc#L49-L50) assumes that the `input_min` and `input_max` tensors have at least one element, as it accesses the first element in two arrays. If the tensors are empty, `.flat<T>()` is an empty object, backed by an empty array. Hence, accesing even the 0th element is a read outside the bounds. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

A malicious insider or compromised data scientist account on a shared ML training cluster opens a notebook and calls tf.raw_ops.MaxPoolGradWithArgmax with empty input_min and input_max tensors. TF accesses index 0 of empty flat arrays, reading beyond heap bounds. In the best case for the attacker, adjacent heap memory is returned — potentially containing model weights, training data batches, or API tokens cached in the same process. In an alternative scenario targeting TF Serving, an external attacker submits a crafted gRPC inference request with empty tensors to a publicly exposed serving endpoint, triggering a heap OOB read that crashes the server or leaks response data from co-located requests. Either path requires no special AI/ML knowledge — just knowledge of the TF op API.

Weaknesses (CWE)

CWE-125 Out-of-bounds Read

CWE-125 — Out-of-bounds Read: The product reads data past the end, or before the beginning, of the intended buffer.

[Implementation] Assume all input is malicious. Use an "accept known good" input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does. When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, "boat" may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as "red" or "blue." Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code's environment changes. This can give attackers enough room to bypass the intended validation. However, denylis
[Architecture and Design] Use a language that provides appropriate memory abstractions.

Source: MITRE CWE corpus.