CVE-2021-29529: TensorFlow: heap buffer overflow in quantized image resize

HIGH PoC AVAILABLE
Published May 14, 2021
CISO Take

Upgrade TensorFlow to 2.5.0 (or 2.4.2/2.3.3/2.2.3/2.1.4) on every system running quantized image processing pipelines. Local exploit with low complexity means a malicious insider or compromised ML worker node can achieve arbitrary code execution within the TensorFlow process. Priority is highest in shared multi-tenant ML environments like Jupyter hubs, MLflow servers, and GPU training clusters.

Risk Assessment

CVSS 7.8 High, but the local-only attack vector prevents direct remote exploitation, limiting realistic exposure. Risk escalates sharply in shared ML compute environments where multiple principals have local access and blast radius expands. Quantized models are standard in edge and optimized inference deployments, broadening the affected surface area. Not in CISA KEV and no confirmed active exploitation, but the vulnerability is fully public with a disclosed PoC in the GitHub advisory.

Affected Systems

Package Ecosystem Vulnerable Range Patched
tensorflow pip No patch
195.0K OpenSSF 7.2 3.7K dependents Pushed 6d ago 4% patched ~1372d to patch Full package profile →

Do you use tensorflow? You're affected.

Severity & Risk

CVSS 3.1
7.8 / 10
EPSS
0.0%
chance of exploitation in 30 days
Higher than 9% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Moderate
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C High
I High
A High

Recommended Action

5 steps
  1. Patch: upgrade to TensorFlow >=2.5.0 or apply cherrypick to 2.4.2, 2.3.3, 2.2.3, or 2.1.4.

  2. Inventory: scan all ML servers, containers, and CI/CD pipelines for vulnerable TensorFlow versions using 'pip show tensorflow' or OCI image scanning.

  3. Isolate: in multi-tenant environments, enforce container/VM-level isolation so TF processes cannot cross trust boundaries.

  4. Restrict: limit direct access to raw TF ops APIs in multi-tenant inference services; prefer high-level Keras APIs that validate inputs.

  5. Detect: alert on unexpected memory errors (SIGSEGV, heap corruption logs) from TF serving processes as potential exploit indicators.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Art.15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2 - AI system lifecycle — Vulnerability management
NIST AI RMF
MANAGE-2.2 - Mechanisms to sustain, update, and decommission AI systems
OWASP LLM Top 10
LLM03:2025 - Supply Chain

Frequently Asked Questions

What is CVE-2021-29529?

Upgrade TensorFlow to 2.5.0 (or 2.4.2/2.3.3/2.2.3/2.1.4) on every system running quantized image processing pipelines. Local exploit with low complexity means a malicious insider or compromised ML worker node can achieve arbitrary code execution within the TensorFlow process. Priority is highest in shared multi-tenant ML environments like Jupyter hubs, MLflow servers, and GPU training clusters.

Is CVE-2021-29529 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29529, increasing the risk of exploitation.

How to fix CVE-2021-29529?

1. Patch: upgrade to TensorFlow >=2.5.0 or apply cherrypick to 2.4.2, 2.3.3, 2.2.3, or 2.1.4. 2. Inventory: scan all ML servers, containers, and CI/CD pipelines for vulnerable TensorFlow versions using 'pip show tensorflow' or OCI image scanning. 3. Isolate: in multi-tenant environments, enforce container/VM-level isolation so TF processes cannot cross trust boundaries. 4. Restrict: limit direct access to raw TF ops APIs in multi-tenant inference services; prefer high-level Keras APIs that validate inputs. 5. Detect: alert on unexpected memory errors (SIGSEGV, heap corruption logs) from TF serving processes as potential exploit indicators.

What systems are affected by CVE-2021-29529?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, edge/mobile inference, image preprocessing pipelines.

What is the CVSS score for CVE-2021-29529?

CVE-2021-29529 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.03%.

Technical Details

NVD Description

TensorFlow is an end-to-end open source platform for machine learning. An attacker can trigger a heap buffer overflow in `tf.raw_ops.QuantizedResizeBilinear` by manipulating input values so that float rounding results in off-by-one error in accessing image elements. This is because the implementation(https://github.com/tensorflow/tensorflow/blob/44b7f486c0143f68b56c34e2d01e146ee445134a/tensorflow/core/kernels/quantized_resize_bilinear_op.cc#L62-L66) computes two integers (representing the upper and lower bounds for interpolation) by ceiling and flooring a floating point value. For some values of `in`, `interpolation->upper[i]` might be smaller than `interpolation->lower[i]`. This is an issue if `interpolation->upper[i]` is capped at `in_size-1` as it means that `interpolation->lower[i]` points outside of the image. Then, in the interpolation code(https://github.com/tensorflow/tensorflow/blob/44b7f486c0143f68b56c34e2d01e146ee445134a/tensorflow/core/kernels/quantized_resize_bilinear_op.cc#L245-L264), this would result in heap buffer overflow. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An attacker with local access to a shared ML inference server or GPU training cluster constructs a crafted image batch where specific floating-point pixel coordinate values cause ceiling and floor interpolation bounds to invert after rounding. When fed to a quantized computer vision model using QuantizedResizeBilinear in its preprocessing graph, this triggers a heap buffer read/write out of bounds. In a Kubeflow or MLflow multi-tenant environment, the attacker escalates from a low-privilege notebook session to code execution in the inference process, potentially accessing other tenants' model weights, API keys stored as environment variables, or the underlying host via container escape primitives.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities