CVE-2021-29578: TensorFlow: heap buffer overflow in FractionalAvgPoolGrad

HIGH PoC AVAILABLE
Published May 14, 2021
CISO Take

TensorFlow 2.1.x–2.4.x contains a heap buffer overflow in FractionalAvgPoolGrad due to missing bounds validation on pooling sequence inputs. In shared ML training environments—Jupyter hubs, GPU clusters, Kubeflow—'local access' effectively means any authenticated user, making this a credible privilege escalation path. Patch to TF 2.5.0 or the available backports immediately; isolate training workloads in separate containers as a compensating control.

Risk Assessment

CVSS 7.8 High with local attack vector limits internet-facing exposure, but multi-tenant ML platforms routinely grant 'local' access to many users. No active exploitation or CISA KEV entry, and the CVE is from 2021, meaning unpatched systems have had three-plus years of exposure window. The C:H/I:H/A:H impact triad makes this a full-compromise vector within the process scope. Risk is elevated for any organization running shared training infrastructure where users can submit arbitrary model code.

Affected Systems

Package Ecosystem Vulnerable Range Patched
tensorflow pip No patch
195.0K OpenSSF 7.2 3.7K dependents Pushed 6d ago 4% patched ~1372d to patch Full package profile →

Do you use tensorflow? You're affected.

Severity & Risk

CVSS 3.1
7.8 / 10
EPSS
0.0%
chance of exploitation in 30 days
Higher than 2% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Moderate
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C High
I High
A High

Recommended Action

1 step
  1. 1) Upgrade to TensorFlow 2.5.0; if constrained to a prior branch, apply backports: 2.4.2, 2.3.3, 2.2.3, or 2.1.4. 2) Run training jobs in isolated single-use containers—never share a TF process across trust boundaries. 3) Restrict who can submit raw tf.raw_ops calls in shared platforms; enforce allowlists of approved ops if feasible. 4) Audit model code for direct use of tf.raw_ops.FractionalAvgPoolGrad with externally-controlled tensor shapes. 5) Ensure ASLR and stack canaries are enabled in training host OS. 6) Rotate any secrets (cloud credentials, API tokens) that were accessible in training worker environments on unpatched systems.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.10.1 - AI system security and resilience
NIST AI RMF
MANAGE 2.2 - Mechanisms are in place to sustain AI risk management practices
OWASP LLM Top 10
LLM03:2025 - Supply Chain

Frequently Asked Questions

What is CVE-2021-29578?

TensorFlow 2.1.x–2.4.x contains a heap buffer overflow in FractionalAvgPoolGrad due to missing bounds validation on pooling sequence inputs. In shared ML training environments—Jupyter hubs, GPU clusters, Kubeflow—'local access' effectively means any authenticated user, making this a credible privilege escalation path. Patch to TF 2.5.0 or the available backports immediately; isolate training workloads in separate containers as a compensating control.

Is CVE-2021-29578 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29578, increasing the risk of exploitation.

How to fix CVE-2021-29578?

1) Upgrade to TensorFlow 2.5.0; if constrained to a prior branch, apply backports: 2.4.2, 2.3.3, 2.2.3, or 2.1.4. 2) Run training jobs in isolated single-use containers—never share a TF process across trust boundaries. 3) Restrict who can submit raw tf.raw_ops calls in shared platforms; enforce allowlists of approved ops if feasible. 4) Audit model code for direct use of tf.raw_ops.FractionalAvgPoolGrad with externally-controlled tensor shapes. 5) Ensure ASLR and stack canaries are enabled in training host OS. 6) Rotate any secrets (cloud credentials, API tokens) that were accessible in training worker environments on unpatched systems.

What systems are affected by CVE-2021-29578?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, shared ML platforms, distributed training clusters, model experimentation environments, containerized ML workloads.

What is the CVSS score for CVE-2021-29578?

CVE-2021-29578 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.01%.

Technical Details

NVD Description

TensorFlow is an end-to-end open source platform for machine learning. The implementation of `tf.raw_ops.FractionalAvgPoolGrad` is vulnerable to a heap buffer overflow. The implementation(https://github.com/tensorflow/tensorflow/blob/dcba796a28364d6d7f003f6fe733d82726dda713/tensorflow/core/kernels/fractional_avg_pool_op.cc#L216) fails to validate that the pooling sequence arguments have enough elements as required by the `out_backprop` tensor shape. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An attacker with access to a shared ML training cluster submits a crafted training job containing a model that calls tf.raw_ops.FractionalAvgPoolGrad with pooling_sequence arguments deliberately undersized relative to the out_backprop tensor shape. The missing bounds check in fractional_avg_pool_op.cc allows a write beyond the allocated heap buffer. On a vulnerable unpatched host, this can be weaponized for code execution within the training worker process—enabling the attacker to exfiltrate other users' model checkpoints, read training datasets, or harvest cloud credentials stored in environment variables, then pivot laterally within the ML infrastructure.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities