CVE-2021-29529: TensorFlow heap buffer overflow

CISO Take

Upgrade TensorFlow to 2.5.0 (or 2.4.2/2.3.3/2.2.3/2.1.4) on every system running quantized image processing pipelines. Local exploit with low complexity means a malicious insider or compromised ML worker node can achieve arbitrary code execution within the TensorFlow process. Priority is highest in shared multi-tenant ML environments like Jupyter hubs, MLflow servers, and GPU training clusters.

What is the risk?

CVSS 7.8 High, but the local-only attack vector prevents direct remote exploitation, limiting realistic exposure. Risk escalates sharply in shared ML compute environments where multiple principals have local access and blast radius expands. Quantized models are standard in edge and optimized inference deployments, broadening the affected surface area. Not in CISA KEV and no confirmed active exploitation, but the vulnerability is fully public with a disclosed PoC in the GitHub advisory.

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
TensorFlow	pip	—	No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 2d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1

7.8 / 10

EPSS

0.3%

chance of exploitation in 30 days

Higher than 16% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Moderate

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV Local

AC Low

PR Low

UI None

S Unchanged

C High

I High

A High

What should I do?

5 steps

Patch: upgrade to TensorFlow >=2.5.0 or apply cherrypick to 2.4.2, 2.3.3, 2.2.3, or 2.1.4.
Inventory: scan all ML servers, containers, and CI/CD pipelines for vulnerable TensorFlow versions using 'pip show tensorflow' or OCI image scanning.
Isolate: in multi-tenant environments, enforce container/VM-level isolation so TF processes cannot cross trust boundaries.
Restrict: limit direct access to raw TF ops APIs in multi-tenant inference services; prefer high-level Keras APIs that validate inputs.
Detect: alert on unexpected memory errors (SIGSEGV, heap corruption logs) from TF serving processes as potential exploit indicators.

How is it classified?

Code Execution Supply Chain Framework Inference AML.T0010.001 - AI Software AML.T0043.003 - Manual Modification AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Art.15 - Accuracy, robustness and cybersecurity

ISO 42001

A.6.2 - AI system lifecycle — Vulnerability management

NIST AI RMF

MANAGE-2.2 - Mechanisms to sustain, update, and decommission AI systems

OWASP LLM Top 10

LLM03:2025 - Supply Chain

Frequently Asked Questions

What is CVE-2021-29529?

Upgrade TensorFlow to 2.5.0 (or 2.4.2/2.3.3/2.2.3/2.1.4) on every system running quantized image processing pipelines. Local exploit with low complexity means a malicious insider or compromised ML worker node can achieve arbitrary code execution within the TensorFlow process. Priority is highest in shared multi-tenant ML environments like Jupyter hubs, MLflow servers, and GPU training clusters.

Is CVE-2021-29529 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29529, increasing the risk of exploitation.

How to fix CVE-2021-29529?

1. Patch: upgrade to TensorFlow >=2.5.0 or apply cherrypick to 2.4.2, 2.3.3, 2.2.3, or 2.1.4. 2. Inventory: scan all ML servers, containers, and CI/CD pipelines for vulnerable TensorFlow versions using 'pip show tensorflow' or OCI image scanning. 3. Isolate: in multi-tenant environments, enforce container/VM-level isolation so TF processes cannot cross trust boundaries. 4. Restrict: limit direct access to raw TF ops APIs in multi-tenant inference services; prefer high-level Keras APIs that validate inputs. 5. Detect: alert on unexpected memory errors (SIGSEGV, heap corruption logs) from TF serving processes as potential exploit indicators.

What systems are affected by CVE-2021-29529?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, edge/mobile inference, image preprocessing pipelines.

What is the CVSS score for CVE-2021-29529?

CVE-2021-29529 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.25%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingedge/mobile inferenceimage preprocessing pipelines

MITRE ATLAS Techniques

AML.T0010.001 AI Software

AML.T0043.003 Manual Modification

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art.15

ISO 42001: A.6.2

NIST AI RMF: MANAGE-2.2

OWASP LLM Top 10: LLM03:2025

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. An attacker can trigger a heap buffer overflow in `tf.raw_ops.QuantizedResizeBilinear` by manipulating input values so that float rounding results in off-by-one error in accessing image elements. This is because the implementation(https://github.com/tensorflow/tensorflow/blob/44b7f486c0143f68b56c34e2d01e146ee445134a/tensorflow/core/kernels/quantized_resize_bilinear_op.cc#L62-L66) computes two integers (representing the upper and lower bounds for interpolation) by ceiling and flooring a floating point value. For some values of `in`, `interpolation->upper[i]` might be smaller than `interpolation->lower[i]`. This is an issue if `interpolation->upper[i]` is capped at `in_size-1` as it means that `interpolation->lower[i]` points outside of the image. Then, in the interpolation code(https://github.com/tensorflow/tensorflow/blob/44b7f486c0143f68b56c34e2d01e146ee445134a/tensorflow/core/kernels/quantized_resize_bilinear_op.cc#L245-L264), this would result in heap buffer overflow. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An attacker with local access to a shared ML inference server or GPU training cluster constructs a crafted image batch where specific floating-point pixel coordinate values cause ceiling and floor interpolation bounds to invert after rounding. When fed to a quantized computer vision model using QuantizedResizeBilinear in its preprocessing graph, this triggers a heap buffer read/write out of bounds. In a Kubeflow or MLflow multi-tenant environment, the attacker escalates from a low-privilege notebook session to code execution in the inference process, potentially accessing other tenants' model weights, API keys stored as environment variables, or the underlying host via container escape primitives.

Weaknesses (CWE)

CWE-193 Off-by-one Error Primary CWE-131 Incorrect Calculation of Buffer Size

CWE-193 — Off-by-one Error: A product calculates or uses an incorrect maximum or minimum value that is 1 more, or 1 less, than the correct value.

[Implementation] When copying character arrays or using character manipulation methods, the correct size parameter must be used to account for the null terminator that needs to be added at the end of the array. Some examples of functions susceptible to this weakness in C include strcpy(), strncpy(), strcat(), strncat(), printf(), sprintf(), scanf() and sscanf().

Source: MITRE CWE corpus.