CVE-2021-29529: TensorFlow: heap buffer overflow in quantized image resize

HIGH PoC AVAILABLE
Published May 14, 2021
CISO Take

Upgrade TensorFlow to 2.5.0 (or 2.4.2/2.3.3/2.2.3/2.1.4) on every system running quantized image processing pipelines. Local exploit with low complexity means a malicious insider or compromised ML worker node can achieve arbitrary code execution within the TensorFlow process. Priority is highest in shared multi-tenant ML environments like Jupyter hubs, MLflow servers, and GPU training clusters.

What is the risk?

CVSS 7.8 High, but the local-only attack vector prevents direct remote exploitation, limiting realistic exposure. Risk escalates sharply in shared ML compute environments where multiple principals have local access and blast radius expands. Quantized models are standard in edge and optimized inference deployments, broadening the affected surface area. Not in CISA KEV and no confirmed active exploitation, but the vulnerability is fully public with a disclosed PoC in the GitHub advisory.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 2d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
7.8 / 10
EPSS
0.3%
chance of exploitation in 30 days
Higher than 16% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Moderate
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C High
I High
A High

What should I do?

5 steps
  1. Patch: upgrade to TensorFlow >=2.5.0 or apply cherrypick to 2.4.2, 2.3.3, 2.2.3, or 2.1.4.

  2. Inventory: scan all ML servers, containers, and CI/CD pipelines for vulnerable TensorFlow versions using 'pip show tensorflow' or OCI image scanning.

  3. Isolate: in multi-tenant environments, enforce container/VM-level isolation so TF processes cannot cross trust boundaries.

  4. Restrict: limit direct access to raw TF ops APIs in multi-tenant inference services; prefer high-level Keras APIs that validate inputs.

  5. Detect: alert on unexpected memory errors (SIGSEGV, heap corruption logs) from TF serving processes as potential exploit indicators.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Art.15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2 - AI system lifecycle — Vulnerability management
NIST AI RMF
MANAGE-2.2 - Mechanisms to sustain, update, and decommission AI systems
OWASP LLM Top 10
LLM03:2025 - Supply Chain

Frequently Asked Questions

What is CVE-2021-29529?

Upgrade TensorFlow to 2.5.0 (or 2.4.2/2.3.3/2.2.3/2.1.4) on every system running quantized image processing pipelines. Local exploit with low complexity means a malicious insider or compromised ML worker node can achieve arbitrary code execution within the TensorFlow process. Priority is highest in shared multi-tenant ML environments like Jupyter hubs, MLflow servers, and GPU training clusters.

Is CVE-2021-29529 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29529, increasing the risk of exploitation.

How to fix CVE-2021-29529?

1. Patch: upgrade to TensorFlow >=2.5.0 or apply cherrypick to 2.4.2, 2.3.3, 2.2.3, or 2.1.4. 2. Inventory: scan all ML servers, containers, and CI/CD pipelines for vulnerable TensorFlow versions using 'pip show tensorflow' or OCI image scanning. 3. Isolate: in multi-tenant environments, enforce container/VM-level isolation so TF processes cannot cross trust boundaries. 4. Restrict: limit direct access to raw TF ops APIs in multi-tenant inference services; prefer high-level Keras APIs that validate inputs. 5. Detect: alert on unexpected memory errors (SIGSEGV, heap corruption logs) from TF serving processes as potential exploit indicators.

What systems are affected by CVE-2021-29529?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, edge/mobile inference, image preprocessing pipelines.

What is the CVSS score for CVE-2021-29529?

CVE-2021-29529 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.25%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingedge/mobile inferenceimage preprocessing pipelines

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0043.003 Manual Modification
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art.15
ISO 42001: A.6.2
NIST AI RMF: MANAGE-2.2
OWASP LLM Top 10: LLM03:2025

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. An attacker can trigger a heap buffer overflow in `tf.raw_ops.QuantizedResizeBilinear` by manipulating input values so that float rounding results in off-by-one error in accessing image elements. This is because the implementation(https://github.com/tensorflow/tensorflow/blob/44b7f486c0143f68b56c34e2d01e146ee445134a/tensorflow/core/kernels/quantized_resize_bilinear_op.cc#L62-L66) computes two integers (representing the upper and lower bounds for interpolation) by ceiling and flooring a floating point value. For some values of `in`, `interpolation->upper[i]` might be smaller than `interpolation->lower[i]`. This is an issue if `interpolation->upper[i]` is capped at `in_size-1` as it means that `interpolation->lower[i]` points outside of the image. Then, in the interpolation code(https://github.com/tensorflow/tensorflow/blob/44b7f486c0143f68b56c34e2d01e146ee445134a/tensorflow/core/kernels/quantized_resize_bilinear_op.cc#L245-L264), this would result in heap buffer overflow. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An attacker with local access to a shared ML inference server or GPU training cluster constructs a crafted image batch where specific floating-point pixel coordinate values cause ceiling and floor interpolation bounds to invert after rounding. When fed to a quantized computer vision model using QuantizedResizeBilinear in its preprocessing graph, this triggers a heap buffer read/write out of bounds. In a Kubeflow or MLflow multi-tenant environment, the attacker escalates from a low-privilege notebook session to code execution in the inference process, potentially accessing other tenants' model weights, API keys stored as environment variables, or the underlying host via container escape primitives.

Weaknesses (CWE)

CWE-193 — Off-by-one Error: A product calculates or uses an incorrect maximum or minimum value that is 1 more, or 1 less, than the correct value.

  • [Implementation] When copying character arrays or using character manipulation methods, the correct size parameter must be used to account for the null terminator that needs to be added at the end of the array. Some examples of functions susceptible to this weakness in C include strcpy(), strncpy(), strcat(), strncat(), printf(), sprintf(), scanf() and sscanf().

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities