CVE-2021-29569: TensorFlow: OOB heap read in MaxPoolGradWithArgmax op

HIGH PoC AVAILABLE
Published May 14, 2021
CISO Take

A heap out-of-bounds read in TensorFlow's MaxPoolGradWithArgmax op allows any local user with low privileges to leak heap memory or crash the TF runtime by passing empty tensors. In shared ML environments — multi-user Jupyter servers, training clusters, model serving endpoints — this is trivially exploitable by any tenant. Patch immediately to TF 2.5.0 or the backported fixes in 2.1.4–2.4.2; there is no workaround short of input validation at the application layer.

Risk Assessment

Moderate in isolated single-user environments; elevated in shared ML infrastructure. The local attack vector limits internet-facing exposure, but multi-tenant GPU servers, JupyterHub deployments, and MLOps platforms running shared TF sessions significantly amplify the blast radius. A CVSS of 7.1 reflects high confidentiality impact (heap data leakage) and high availability impact (process crash). Not in CISA KEV and no public exploit weaponization observed, but the primitive is trivial to construct — any user who can call TF ops can trigger it.

Affected Systems

Package Ecosystem Vulnerable Range Patched
tensorflow pip No patch
195.0K OpenSSF 7.2 3.7K dependents Pushed today 4% patched ~1372d to patch Full package profile →

Do you use tensorflow? You're affected.

Severity & Risk

CVSS 3.1
7.1 / 10
EPSS
0.0%
chance of exploitation in 30 days
Higher than 1% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C High
I None
A High

Recommended Action

6 steps
  1. Patch: Upgrade to TensorFlow 2.5.0, or apply backport commits to 2.4.2/2.3.3/2.2.3/2.1.4 (commit ef0c008ee84bad91ec6725ddc42091e19a30cf0e).

  2. Input validation: Enforce tensor shape/element-count checks at API boundaries before ops execute — reject empty tensors for ops requiring at least one element.

  3. Network segmentation: If using TF Serving, restrict access to trusted networks; do not expose raw-op endpoints publicly.

  4. Isolation: Run training jobs in dedicated containers or VMs per user/team to contain blast radius if exploited on shared infrastructure.

  5. Detection: Alert on unexpected SIGSEGV or process crashes from TF worker processes; anomalous crash dumps from training jobs warrant investigation.

  6. Audit: Inventory all TF versions deployed across training, serving, and notebook infrastructure — shadow AI deployments are a common blind spot.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Art. 9 - Risk management system
ISO 42001
A.6.2.6 - Information security for AI systems
NIST AI RMF
MANAGE 2.2 - Mechanisms to sustain the value of deployed AI are in place
OWASP LLM Top 10
LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-29569?

A heap out-of-bounds read in TensorFlow's MaxPoolGradWithArgmax op allows any local user with low privileges to leak heap memory or crash the TF runtime by passing empty tensors. In shared ML environments — multi-user Jupyter servers, training clusters, model serving endpoints — this is trivially exploitable by any tenant. Patch immediately to TF 2.5.0 or the backported fixes in 2.1.4–2.4.2; there is no workaround short of input validation at the application layer.

Is CVE-2021-29569 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29569, increasing the risk of exploitation.

How to fix CVE-2021-29569?

1. Patch: Upgrade to TensorFlow 2.5.0, or apply backport commits to 2.4.2/2.3.3/2.2.3/2.1.4 (commit ef0c008ee84bad91ec6725ddc42091e19a30cf0e). 2. Input validation: Enforce tensor shape/element-count checks at API boundaries before ops execute — reject empty tensors for ops requiring at least one element. 3. Network segmentation: If using TF Serving, restrict access to trusted networks; do not expose raw-op endpoints publicly. 4. Isolation: Run training jobs in dedicated containers or VMs per user/team to contain blast radius if exploited on shared infrastructure. 5. Detection: Alert on unexpected SIGSEGV or process crashes from TF worker processes; anomalous crash dumps from training jobs warrant investigation. 6. Audit: Inventory all TF versions deployed across training, serving, and notebook infrastructure — shadow AI deployments are a common blind spot.

What systems are affected by CVE-2021-29569?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, MLOps platforms, shared Jupyter environments.

What is the CVSS score for CVE-2021-29569?

CVE-2021-29569 has a CVSS v3.1 base score of 7.1 (HIGH). The EPSS exploitation probability is 0.01%.

Technical Details

NVD Description

TensorFlow is an end-to-end open source platform for machine learning. The implementation of `tf.raw_ops.MaxPoolGradWithArgmax` can cause reads outside of bounds of heap allocated data if attacker supplies specially crafted inputs. The implementation(https://github.com/tensorflow/tensorflow/blob/ac328eaa3870491ababc147822cd04e91a790643/tensorflow/core/kernels/requantization_range_op.cc#L49-L50) assumes that the `input_min` and `input_max` tensors have at least one element, as it accesses the first element in two arrays. If the tensors are empty, `.flat<T>()` is an empty object, backed by an empty array. Hence, accesing even the 0th element is a read outside the bounds. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

A malicious insider or compromised data scientist account on a shared ML training cluster opens a notebook and calls tf.raw_ops.MaxPoolGradWithArgmax with empty input_min and input_max tensors. TF accesses index 0 of empty flat arrays, reading beyond heap bounds. In the best case for the attacker, adjacent heap memory is returned — potentially containing model weights, training data batches, or API tokens cached in the same process. In an alternative scenario targeting TF Serving, an external attacker submits a crafted gRPC inference request with empty tensors to a publicly exposed serving endpoint, triggering a heap OOB read that crashes the server or leaks response data from co-located requests. Either path requires no special AI/ML knowledge — just knowledge of the TF op API.

Weaknesses (CWE)

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:N/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities