CVE-2021-29607: TensorFlow heap OOB write

CISO Take

TensorFlow's SparseAdd kernel skips validation for empty tensors and dimension mismatches, letting any low-privileged local user trigger null pointer dereferences and heap out-of-bounds writes. Shared ML infrastructure — Jupyter hubs, Kubeflow clusters, TF Serving endpoints accepting user-supplied sparse inputs — is the primary exposure surface. Patch to TensorFlow 2.5.0 or the corresponding 2.x cherrypick releases immediately; no workaround exists short of restricting access to TF compute environments.

What is the risk?

High (CVSS 7.8). Local attack vector bounds the exposure relative to network-exploitable bugs, but multi-tenant ML environments (shared GPU clusters, JupyterHub, Kubeflow) expand the practical attack surface considerably. Low complexity with no user interaction required means any authenticated tenant on a shared training cluster can reliably trigger the memory corruption primitive. Not in CISA KEV, indicating limited observed in-the-wild exploitation, but the heap OOB write is a serious primitive that can be chained toward code execution. AI/ML organizations running older TensorFlow branches (2.1–2.4) in production training pipelines carry the highest residual risk.

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
TensorFlow	pip	—	No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 2d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1

7.8 / 10

EPSS

0.2%

chance of exploitation in 30 days

Higher than 14% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Moderate

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV Local

AC Low

PR Low

UI None

S Unchanged

C High

I High

A High

What should I do?

5 steps

PATCH

Upgrade to TensorFlow 2.5.0+, or apply cherrypick commits ba6822bd and f6fde895 to 2.4.x/2.3.x/2.2.x/2.1.x branches.
INVENTORY

Audit all TensorFlow versions across training, inference, and developer environments — including base container images and frozen dependencies in MLOps pipelines.
ISOLATE

Restrict TF Serving endpoints and training job submission to authenticated, authorized users; block untrusted sparse tensor input paths at the API boundary.
HARDEN

Run TensorFlow workloads in containers with restrictive seccomp profiles and without CAP_SYS_ADMIN to limit post-exploitation impact.
DETECT

Alert on unexpected segfaults or OOM kills in TF worker processes, which may indicate active exploitation attempts.

How is it classified?

Code Execution DoS Framework Inference Training Data AML.T0010.001 - AI Software AML.T0043.003 - Manual Modification AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

ISO 42001

8.2 - AI risk assessment

NIST AI RMF

MANAGE-2.2 - Mechanisms are in place and applied to sustain the value of deployed AI systems

OWASP LLM Top 10

LLM03:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-29607?

TensorFlow's SparseAdd kernel skips validation for empty tensors and dimension mismatches, letting any low-privileged local user trigger null pointer dereferences and heap out-of-bounds writes. Shared ML infrastructure — Jupyter hubs, Kubeflow clusters, TF Serving endpoints accepting user-supplied sparse inputs — is the primary exposure surface. Patch to TensorFlow 2.5.0 or the corresponding 2.x cherrypick releases immediately; no workaround exists short of restricting access to TF compute environments.

Is CVE-2021-29607 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29607, increasing the risk of exploitation.

How to fix CVE-2021-29607?

1. PATCH: Upgrade to TensorFlow 2.5.0+, or apply cherrypick commits ba6822bd and f6fde895 to 2.4.x/2.3.x/2.2.x/2.1.x branches. 2. INVENTORY: Audit all TensorFlow versions across training, inference, and developer environments — including base container images and frozen dependencies in MLOps pipelines. 3. ISOLATE: Restrict TF Serving endpoints and training job submission to authenticated, authorized users; block untrusted sparse tensor input paths at the API boundary. 4. HARDEN: Run TensorFlow workloads in containers with restrictive seccomp profiles and without CAP_SYS_ADMIN to limit post-exploitation impact. 5. DETECT: Alert on unexpected segfaults or OOM kills in TF worker processes, which may indicate active exploitation attempts.

What systems are affected by CVE-2021-29607?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, shared ML infrastructure, recommendation systems.

What is the CVSS score for CVE-2021-29607?

CVE-2021-29607 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.23%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingshared ML infrastructurerecommendation systems

MITRE ATLAS Techniques

AML.T0010.001 AI Software

AML.T0043.003 Manual Modification

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

ISO 42001: 8.2

NIST AI RMF: MANAGE-2.2

OWASP LLM Top 10: LLM03:2025

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. Incomplete validation in `SparseAdd` results in allowing attackers to exploit undefined behavior (dereferencing null pointers) as well as write outside of bounds of heap allocated data. The implementation(https://github.com/tensorflow/tensorflow/blob/656e7673b14acd7835dc778867f84916c6d1cac2/tensorflow/core/kernels/sparse_sparse_binary_op_shared.cc) has a large set of validation for the two sparse tensor inputs (6 tensors in total), but does not validate that the tensors are not empty or that the second dimension of `*_indices` matches the size of corresponding `*_shape`. This allows attackers to send tensor triples that represent invalid sparse tensors to abuse code assumptions that are not protected by validation. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with a valid account on a shared ML training cluster (compromised data scientist credentials, insider threat, or lateral movement from a workstation) submits a TensorFlow job that calls SparseAdd with a deliberately malformed sparse tensor triple: either an empty indices tensor or a second dimension that does not match the corresponding shape tensor. The kernel bypasses all six validation checks and proceeds to dereference a null pointer or write beyond the heap buffer boundary. On a multi-tenant GPU node, this memory corruption can overwrite adjacent process memory — including in-flight model parameters or training batches — or be escalated via heap grooming techniques to achieve arbitrary code execution on the ML host, granting access to all co-resident model weights, datasets, and credentials.

Weaknesses (CWE)

CWE-754 Improper Check for Unusual or Exceptional Conditions

CWE-754 — Improper Check for Unusual or Exceptional Conditions: The product does not check or incorrectly checks for unusual or exceptional conditions that are not expected to occur frequently during day to day operation of the product.

[Requirements] Use a language that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. Choose languages with features such as exception handling that force the programmer to anticipate unusual conditions that may generate exceptions. Custom exceptions may need to be developed to handle unusual business-logic conditions. Be careful not to pass sensitive exceptions back to the user (CWE-209, CWE-248).
[Implementation] Check the results of all functions that return a value and verify that the value is expected.

Source: MITRE CWE corpus.