CVE-2021-29583: TensorFlow heap overflow

CISO Take

A heap buffer overflow in TensorFlow's FusedBatchNorm op lets low-privileged users trigger out-of-bounds reads or null pointer dereferences via malformed tensor inputs, with code execution potential. Upgrade to TensorFlow 2.5.0 or patched backports (2.1.4–2.4.2) immediately. Multi-tenant ML training clusters and shared inference infrastructure are the highest-risk environments.

What is the risk?

CVSS 7.8 HIGH with local attack vector and low privilege requirement. While local-only, shared ML training clusters and multi-tenant GPU platforms expose this surface to non-admin users who can submit arbitrary tensor inputs. The combination of C:H/I:H/A:H impact and low attack complexity makes this a priority patch wherever TensorFlow runs with multi-user access. Not in CISA KEV and from 2021, so unlikely to be actively targeted, but unpatched legacy TF deployments remain at risk.

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
TensorFlow	pip	—	No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1

7.8 / 10

EPSS

0.2%

chance of exploitation in 30 days

Higher than 11% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Moderate

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV Local

AC Low

PR Low

UI None

S Unchanged

C High

I High

A High

What should I do?

5 steps

Patch: Upgrade to TensorFlow 2.5.0 or apply cherrypicks for supported branches (2.4.2, 2.3.3, 2.2.3, 2.1.4) per commit 6972f9d.
Isolate: Run training workers and TF Serving in containers with seccomp/AppArmor profiles and minimal privileges.
Input validation: Assert tensor dimension consistency (channel counts of scale, offset, mean, variance match x) before executing FusedBatchNorm ops.
Monitor: Alert on TensorFlow process crashes (SIGSEGV/SIGABRT) as exploitation indicators.
Network isolation: Restrict TF Serving endpoints to internal networks; never expose raw op execution to untrusted external callers.

How is it classified?

Code Execution DoS Framework Inference AML.T0001 - Search Open AI Vulnerability Analysis AML.T0010.001 - AI Software AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Article 9 - Risk management system

ISO 42001

A.9.1 - Information security policies for AI systems

NIST AI RMF

MANAGE-2.2 - Mechanisms to sustain the value of deployed AI

OWASP LLM Top 10

LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2021-29583?

A heap buffer overflow in TensorFlow's FusedBatchNorm op lets low-privileged users trigger out-of-bounds reads or null pointer dereferences via malformed tensor inputs, with code execution potential. Upgrade to TensorFlow 2.5.0 or patched backports (2.1.4–2.4.2) immediately. Multi-tenant ML training clusters and shared inference infrastructure are the highest-risk environments.

Is CVE-2021-29583 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29583, increasing the risk of exploitation.

How to fix CVE-2021-29583?

1. Patch: Upgrade to TensorFlow 2.5.0 or apply cherrypicks for supported branches (2.4.2, 2.3.3, 2.2.3, 2.1.4) per commit 6972f9d. 2. Isolate: Run training workers and TF Serving in containers with seccomp/AppArmor profiles and minimal privileges. 3. Input validation: Assert tensor dimension consistency (channel counts of scale, offset, mean, variance match x) before executing FusedBatchNorm ops. 4. Monitor: Alert on TensorFlow process crashes (SIGSEGV/SIGABRT) as exploitation indicators. 5. Network isolation: Restrict TF Serving endpoints to internal networks; never expose raw op execution to untrusted external callers.

What systems are affected by CVE-2021-29583?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving.

What is the CVSS score for CVE-2021-29583?

CVE-2021-29583 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.21%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel serving

MITRE ATLAS Techniques

AML.T0001 Search Open AI Vulnerability Analysis

AML.T0010.001 AI Software

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 9

ISO 42001: A.9.1

NIST AI RMF: MANAGE-2.2

OWASP LLM Top 10: LLM04

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. The implementation of `tf.raw_ops.FusedBatchNorm` is vulnerable to a heap buffer overflow. If the tensors are empty, the same implementation can trigger undefined behavior by dereferencing null pointers. The implementation(https://github.com/tensorflow/tensorflow/blob/57d86e0db5d1365f19adcce848dfc1bf89fdd4c7/tensorflow/core/kernels/fused_batch_norm_op.cc) fails to validate that `scale`, `offset`, `mean` and `variance` (the last two only when required) all have the same number of elements as the number of channels of `x`. This results in heap out of bounds reads when the buffers backing these tensors are indexed past their boundary. If the tensors are empty, the validation mentioned in the above paragraph would also trigger and prevent the undefined behavior. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with data scientist access to a shared Kubernetes GPU cluster submits a training job containing a model with FusedBatchNorm layers fed empty or channel-mismatched tensors. The missing dimension validation triggers a heap OOB read, corrupting adjacent memory. With a crafted heap layout, the adversary escalates to code execution within the training worker pod, enabling lateral movement to access other tenants' model artifacts, training data, environment credentials, or cloud IAM tokens mounted in the pod.

Weaknesses (CWE)

CWE-125 Out-of-bounds Read Primary CWE-476 NULL Pointer Dereference Primary CWE-476 NULL Pointer Dereference

CWE-125 — Out-of-bounds Read: The product reads data past the end, or before the beginning, of the intended buffer.

[Implementation] Assume all input is malicious. Use an "accept known good" input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does. When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, "boat" may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as "red" or "blue." Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code's environment changes. This can give attackers enough room to bypass the intended validation. However, denylis
[Architecture and Design] Use a language that provides appropriate memory abstractions.

Source: MITRE CWE corpus.