CVE-2021-29607: TensorFlow: heap OOB write in SparseAdd op

HIGH PoC AVAILABLE
Published May 14, 2021
CISO Take

TensorFlow's SparseAdd kernel skips validation for empty tensors and dimension mismatches, letting any low-privileged local user trigger null pointer dereferences and heap out-of-bounds writes. Shared ML infrastructure — Jupyter hubs, Kubeflow clusters, TF Serving endpoints accepting user-supplied sparse inputs — is the primary exposure surface. Patch to TensorFlow 2.5.0 or the corresponding 2.x cherrypick releases immediately; no workaround exists short of restricting access to TF compute environments.

Risk Assessment

High (CVSS 7.8). Local attack vector bounds the exposure relative to network-exploitable bugs, but multi-tenant ML environments (shared GPU clusters, JupyterHub, Kubeflow) expand the practical attack surface considerably. Low complexity with no user interaction required means any authenticated tenant on a shared training cluster can reliably trigger the memory corruption primitive. Not in CISA KEV, indicating limited observed in-the-wild exploitation, but the heap OOB write is a serious primitive that can be chained toward code execution. AI/ML organizations running older TensorFlow branches (2.1–2.4) in production training pipelines carry the highest residual risk.

Affected Systems

Package Ecosystem Vulnerable Range Patched
tensorflow pip No patch
195.0K OpenSSF 7.2 3.7K dependents Pushed 6d ago 4% patched ~1372d to patch Full package profile →

Do you use tensorflow? You're affected.

Severity & Risk

CVSS 3.1
7.8 / 10
EPSS
0.0%
chance of exploitation in 30 days
Higher than 5% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Moderate
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C High
I High
A High

Recommended Action

5 steps
  1. PATCH

    Upgrade to TensorFlow 2.5.0+, or apply cherrypick commits ba6822bd and f6fde895 to 2.4.x/2.3.x/2.2.x/2.1.x branches.

  2. INVENTORY

    Audit all TensorFlow versions across training, inference, and developer environments — including base container images and frozen dependencies in MLOps pipelines.

  3. ISOLATE

    Restrict TF Serving endpoints and training job submission to authenticated, authorized users; block untrusted sparse tensor input paths at the API boundary.

  4. HARDEN

    Run TensorFlow workloads in containers with restrictive seccomp profiles and without CAP_SYS_ADMIN to limit post-exploitation impact.

  5. DETECT

    Alert on unexpected segfaults or OOM kills in TF worker processes, which may indicate active exploitation attempts.

Classification

Compliance Impact

This CVE is relevant to:

ISO 42001
8.2 - AI risk assessment
NIST AI RMF
MANAGE-2.2 - Mechanisms are in place and applied to sustain the value of deployed AI systems
OWASP LLM Top 10
LLM03:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-29607?

TensorFlow's SparseAdd kernel skips validation for empty tensors and dimension mismatches, letting any low-privileged local user trigger null pointer dereferences and heap out-of-bounds writes. Shared ML infrastructure — Jupyter hubs, Kubeflow clusters, TF Serving endpoints accepting user-supplied sparse inputs — is the primary exposure surface. Patch to TensorFlow 2.5.0 or the corresponding 2.x cherrypick releases immediately; no workaround exists short of restricting access to TF compute environments.

Is CVE-2021-29607 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29607, increasing the risk of exploitation.

How to fix CVE-2021-29607?

1. PATCH: Upgrade to TensorFlow 2.5.0+, or apply cherrypick commits ba6822bd and f6fde895 to 2.4.x/2.3.x/2.2.x/2.1.x branches. 2. INVENTORY: Audit all TensorFlow versions across training, inference, and developer environments — including base container images and frozen dependencies in MLOps pipelines. 3. ISOLATE: Restrict TF Serving endpoints and training job submission to authenticated, authorized users; block untrusted sparse tensor input paths at the API boundary. 4. HARDEN: Run TensorFlow workloads in containers with restrictive seccomp profiles and without CAP_SYS_ADMIN to limit post-exploitation impact. 5. DETECT: Alert on unexpected segfaults or OOM kills in TF worker processes, which may indicate active exploitation attempts.

What systems are affected by CVE-2021-29607?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, shared ML infrastructure, recommendation systems.

What is the CVSS score for CVE-2021-29607?

CVE-2021-29607 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.02%.

Technical Details

NVD Description

TensorFlow is an end-to-end open source platform for machine learning. Incomplete validation in `SparseAdd` results in allowing attackers to exploit undefined behavior (dereferencing null pointers) as well as write outside of bounds of heap allocated data. The implementation(https://github.com/tensorflow/tensorflow/blob/656e7673b14acd7835dc778867f84916c6d1cac2/tensorflow/core/kernels/sparse_sparse_binary_op_shared.cc) has a large set of validation for the two sparse tensor inputs (6 tensors in total), but does not validate that the tensors are not empty or that the second dimension of `*_indices` matches the size of corresponding `*_shape`. This allows attackers to send tensor triples that represent invalid sparse tensors to abuse code assumptions that are not protected by validation. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with a valid account on a shared ML training cluster (compromised data scientist credentials, insider threat, or lateral movement from a workstation) submits a TensorFlow job that calls SparseAdd with a deliberately malformed sparse tensor triple: either an empty indices tensor or a second dimension that does not match the corresponding shape tensor. The kernel bypasses all six validation checks and proceeds to dereference a null pointer or write beyond the heap buffer boundary. On a multi-tenant GPU node, this memory corruption can overwrite adjacent process memory — including in-flight model parameters or training batches — or be escalated via heap grooming techniques to achieve arbitrary code execution on the ML host, granting access to all co-resident model weights, datasets, and credentials.

Weaknesses (CWE)

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities