CVE-2021-29607: TensorFlow: heap OOB write in SparseAdd op
HIGH PoC AVAILABLETensorFlow's SparseAdd kernel skips validation for empty tensors and dimension mismatches, letting any low-privileged local user trigger null pointer dereferences and heap out-of-bounds writes. Shared ML infrastructure — Jupyter hubs, Kubeflow clusters, TF Serving endpoints accepting user-supplied sparse inputs — is the primary exposure surface. Patch to TensorFlow 2.5.0 or the corresponding 2.x cherrypick releases immediately; no workaround exists short of restricting access to TF compute environments.
What is the risk?
High (CVSS 7.8). Local attack vector bounds the exposure relative to network-exploitable bugs, but multi-tenant ML environments (shared GPU clusters, JupyterHub, Kubeflow) expand the practical attack surface considerably. Low complexity with no user interaction required means any authenticated tenant on a shared training cluster can reliably trigger the memory corruption primitive. Not in CISA KEV, indicating limited observed in-the-wild exploitation, but the heap OOB write is a serious primitive that can be chained toward code execution. AI/ML organizations running older TensorFlow branches (2.1–2.4) in production training pipelines carry the highest residual risk.
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| TensorFlow | pip | — | No patch |
Do you use TensorFlow? You're affected.
How severe is it?
What is the attack surface?
What should I do?
5 steps-
PATCH
Upgrade to TensorFlow 2.5.0+, or apply cherrypick commits ba6822bd and f6fde895 to 2.4.x/2.3.x/2.2.x/2.1.x branches.
-
INVENTORY
Audit all TensorFlow versions across training, inference, and developer environments — including base container images and frozen dependencies in MLOps pipelines.
-
ISOLATE
Restrict TF Serving endpoints and training job submission to authenticated, authorized users; block untrusted sparse tensor input paths at the API boundary.
-
HARDEN
Run TensorFlow workloads in containers with restrictive seccomp profiles and without CAP_SYS_ADMIN to limit post-exploitation impact.
-
DETECT
Alert on unexpected segfaults or OOM kills in TF worker processes, which may indicate active exploitation attempts.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2021-29607?
TensorFlow's SparseAdd kernel skips validation for empty tensors and dimension mismatches, letting any low-privileged local user trigger null pointer dereferences and heap out-of-bounds writes. Shared ML infrastructure — Jupyter hubs, Kubeflow clusters, TF Serving endpoints accepting user-supplied sparse inputs — is the primary exposure surface. Patch to TensorFlow 2.5.0 or the corresponding 2.x cherrypick releases immediately; no workaround exists short of restricting access to TF compute environments.
Is CVE-2021-29607 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2021-29607, increasing the risk of exploitation.
How to fix CVE-2021-29607?
1. PATCH: Upgrade to TensorFlow 2.5.0+, or apply cherrypick commits ba6822bd and f6fde895 to 2.4.x/2.3.x/2.2.x/2.1.x branches. 2. INVENTORY: Audit all TensorFlow versions across training, inference, and developer environments — including base container images and frozen dependencies in MLOps pipelines. 3. ISOLATE: Restrict TF Serving endpoints and training job submission to authenticated, authorized users; block untrusted sparse tensor input paths at the API boundary. 4. HARDEN: Run TensorFlow workloads in containers with restrictive seccomp profiles and without CAP_SYS_ADMIN to limit post-exploitation impact. 5. DETECT: Alert on unexpected segfaults or OOM kills in TF worker processes, which may indicate active exploitation attempts.
What systems are affected by CVE-2021-29607?
This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, shared ML infrastructure, recommendation systems.
What is the CVSS score for CVE-2021-29607?
CVE-2021-29607 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.23%.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0010.001 AI Software AML.T0043.003 Manual Modification AML.T0049 Exploit Public-Facing Application Compliance Controls Affected
What are the technical details?
Original Advisory
TensorFlow is an end-to-end open source platform for machine learning. Incomplete validation in `SparseAdd` results in allowing attackers to exploit undefined behavior (dereferencing null pointers) as well as write outside of bounds of heap allocated data. The implementation(https://github.com/tensorflow/tensorflow/blob/656e7673b14acd7835dc778867f84916c6d1cac2/tensorflow/core/kernels/sparse_sparse_binary_op_shared.cc) has a large set of validation for the two sparse tensor inputs (6 tensors in total), but does not validate that the tensors are not empty or that the second dimension of `*_indices` matches the size of corresponding `*_shape`. This allows attackers to send tensor triples that represent invalid sparse tensors to abuse code assumptions that are not protected by validation. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.
Exploitation Scenario
An adversary with a valid account on a shared ML training cluster (compromised data scientist credentials, insider threat, or lateral movement from a workstation) submits a TensorFlow job that calls SparseAdd with a deliberately malformed sparse tensor triple: either an empty indices tensor or a second dimension that does not match the corresponding shape tensor. The kernel bypasses all six validation checks and proceeds to dereference a null pointer or write beyond the heap buffer boundary. On a multi-tenant GPU node, this memory corruption can overwrite adjacent process memory — including in-flight model parameters or training batches — or be escalated via heap grooming techniques to achieve arbitrary code execution on the ML host, granting access to all co-resident model weights, datasets, and credentials.
Weaknesses (CWE)
CWE-754 — Improper Check for Unusual or Exceptional Conditions: The product does not check or incorrectly checks for unusual or exceptional conditions that are not expected to occur frequently during day to day operation of the product.
- [Requirements] Use a language that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. Choose languages with features such as exception handling that force the programmer to anticipate unusual conditions that may generate exceptions. Custom exceptions may need to be developed to handle unusual business-logic conditions. Be careful not to pass sensitive exceptions back to the user (CWE-209, CWE-248).
- [Implementation] Check the results of all functions that return a value and verify that the value is expected.
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H References
- github.com/tensorflow/tensorflow/commit/ba6822bd7b7324ba201a28b2f278c29a98edbef2 Patch 3rd Party
- github.com/tensorflow/tensorflow/commit/f6fde895ef9c77d848061c0517f19d0ec2682f3a Patch 3rd Party
- github.com/tensorflow/tensorflow/security/advisories/GHSA-gv26-jpj9-c8gq Exploit Patch 3rd Party
Timeline
Related Vulnerabilities
CVE-2020-15196 9.9 TensorFlow: heap OOB read in sparse/ragged count ops
Same package: tensorflow CVE-2020-15205 9.8 TensorFlow: heap overflow in StringNGrams, ASLR bypass
Same package: tensorflow CVE-2020-15208 9.8 TFLite: OOB read/write via tensor dimension mismatch
Same package: tensorflow CVE-2019-16778 9.8 TensorFlow: heap overflow in UnsortedSegmentSum op
Same package: tensorflow CVE-2022-23587 9.8 TensorFlow: integer overflow in Grappler enables RCE
Same package: tensorflow