CVE-2021-29516: TensorFlow null ptr deref crashes

Q: Is CVE-2021-29516 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29516, increasing the risk of exploitation.

Q: What systems are affected by CVE-2021-29516?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, shared ML notebooks, MLOps platforms, feature engineering pipelines.

Q: What is the CVSS score for CVE-2021-29516?

CVE-2021-29516 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.20%.

CISO Take

A crafted empty RaggedTensor argument crashes TensorFlow processes via null pointer dereference (CWE-476), causing availability loss in training or inference workloads. Patch immediately to TF 2.5.0 or apply available backports (2.4.2/2.3.3/2.2.3/2.1.4). Risk is highest in shared ML infrastructure where low-privileged users can submit arbitrary TF ops.

What is the risk?

Medium risk overall, but operationally significant in multi-tenant ML environments. The local attack vector limits remote exposure — however, shared training clusters, MLOps job submission APIs, and hosted notebook platforms (Jupyter, Colab-style) expose this to low-privileged users. No data exfiltration or code execution; impact is confined to process availability. Exploitation is trivial once an attacker can execute TF code, making internal threat actors and compromised low-privilege accounts the primary concern.

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
TensorFlow	pip	—	No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 2d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1

5.5 / 10

EPSS

0.2%

chance of exploitation in 30 days

Higher than 10% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Trivial

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV Local

AC Low

PR Low

UI None

S Unchanged

C None

I None

A High

What should I do?

1 step

1) PATCH: Upgrade to TensorFlow 2.5.0. Backports available for 2.4.2, 2.3.3, 2.2.3, 2.1.4 — apply the commit b055b9c to older pinned deployments. 2) VALIDATE: Add pre-call assertion that ragged tensor splits is non-empty before invoking RaggedTensorToVariant; enforce input validation at pipeline boundaries. 3) ISOLATE: In shared ML infrastructure, run tenant workloads in separate processes or containers to contain blast radius of any crash. 4) DETECT: Alert on unexpected TensorFlow worker process crashes; audit job submission logs for RaggedTensorToVariant calls with unusual tensor shapes. 5) INVENTORY: Identify all production services pinned to TF <2.5.0 using dependency scanning (pip-audit, Snyk).

How is it classified?

DoS Framework AML.T0010.001 - AI Software AML.T0029 - Denial of AI Service AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Art. 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.6.2 - AI system risk management

NIST AI RMF

MANAGE-2.2 - Mechanisms to sustain the value of deployed AI systems are evaluated and applied

OWASP LLM Top 10

LLM04:2025 - Model Denial of Service

Frequently Asked Questions

What is CVE-2021-29516?

A crafted empty RaggedTensor argument crashes TensorFlow processes via null pointer dereference (CWE-476), causing availability loss in training or inference workloads. Patch immediately to TF 2.5.0 or apply available backports (2.4.2/2.3.3/2.2.3/2.1.4). Risk is highest in shared ML infrastructure where low-privileged users can submit arbitrary TF ops.

Is CVE-2021-29516 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29516, increasing the risk of exploitation.

How to fix CVE-2021-29516?

1) PATCH: Upgrade to TensorFlow 2.5.0. Backports available for 2.4.2, 2.3.3, 2.2.3, 2.1.4 — apply the commit b055b9c to older pinned deployments. 2) VALIDATE: Add pre-call assertion that ragged tensor splits is non-empty before invoking RaggedTensorToVariant; enforce input validation at pipeline boundaries. 3) ISOLATE: In shared ML infrastructure, run tenant workloads in separate processes or containers to contain blast radius of any crash. 4) DETECT: Alert on unexpected TensorFlow worker process crashes; audit job submission logs for RaggedTensorToVariant calls with unusual tensor shapes. 5) INVENTORY: Identify all production services pinned to TF <2.5.0 using dependency scanning (pip-audit, Snyk).

What systems are affected by CVE-2021-29516?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, shared ML notebooks, MLOps platforms, feature engineering pipelines.

What is the CVSS score for CVE-2021-29516?

CVE-2021-29516 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.20%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingshared ML notebooksMLOps platformsfeature engineering pipelines

MITRE ATLAS Techniques

AML.T0010.001 AI Software

AML.T0029 Denial of AI Service

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art. 15

ISO 42001: A.6.2

NIST AI RMF: MANAGE-2.2

OWASP LLM Top 10: LLM04:2025

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. Calling `tf.raw_ops.RaggedTensorToVariant` with arguments specifying an invalid ragged tensor results in a null pointer dereference. The implementation of `RaggedTensorToVariant` operations(https://github.com/tensorflow/tensorflow/blob/904b3926ed1c6c70380d5313d282d248a776baa1/tensorflow/core/kernels/ragged_tensor_to_variant_op.cc#L39-L40) does not validate that the ragged tensor argument is non-empty. Since `batched_ragged` contains no elements, `batched_ragged.splits` is a null vector, thus `batched_ragged.splits(0)` will result in dereferencing `nullptr`. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with low-privileged access to a shared MLOps platform (e.g., internal Kubeflow or SageMaker-style cluster) submits a training job calling tf.raw_ops.RaggedTensorToVariant with an empty ragged tensor (null splits vector). The TF worker process dereferences a null pointer and crashes immediately, disrupting any co-located training jobs on the same worker. The attacker repeats submission in a loop, causing sustained denial of service across the cluster. In a multi-tenant environment, this becomes a low-effort lateral disruption tool — no elevated privileges, no special TF knowledge, just a one-liner crafted op call.

Weaknesses (CWE)

CWE-476 NULL Pointer Dereference

CWE-476 — NULL Pointer Dereference: The product dereferences a pointer that it expects to be valid but is NULL.

[Implementation] For any pointers that could have been modified or provided from a function that can return NULL, check the pointer for NULL before use. When working with a multithreaded or otherwise asynchronous environment, ensure that proper locking APIs are used to lock before the check, and unlock when it has finished [REF-1484].
[Requirements] Select a programming language that is not susceptible to these issues.

Source: MITRE CWE corpus.