CVE-2021-29516: TensorFlow: null ptr deref crashes RaggedTensor ops

MEDIUM PoC AVAILABLE
Published May 14, 2021
CISO Take

A crafted empty RaggedTensor argument crashes TensorFlow processes via null pointer dereference (CWE-476), causing availability loss in training or inference workloads. Patch immediately to TF 2.5.0 or apply available backports (2.4.2/2.3.3/2.2.3/2.1.4). Risk is highest in shared ML infrastructure where low-privileged users can submit arbitrary TF ops.

What is the risk?

Medium risk overall, but operationally significant in multi-tenant ML environments. The local attack vector limits remote exposure — however, shared training clusters, MLOps job submission APIs, and hosted notebook platforms (Jupyter, Colab-style) expose this to low-privileged users. No data exfiltration or code execution; impact is confined to process availability. Exploitation is trivial once an attacker can execute TF code, making internal threat actors and compromised low-privilege accounts the primary concern.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 2d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
5.5 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 10% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

What should I do?

1 step
  1. 1) PATCH: Upgrade to TensorFlow 2.5.0. Backports available for 2.4.2, 2.3.3, 2.2.3, 2.1.4 — apply the commit b055b9c to older pinned deployments. 2) VALIDATE: Add pre-call assertion that ragged tensor splits is non-empty before invoking RaggedTensorToVariant; enforce input validation at pipeline boundaries. 3) ISOLATE: In shared ML infrastructure, run tenant workloads in separate processes or containers to contain blast radius of any crash. 4) DETECT: Alert on unexpected TensorFlow worker process crashes; audit job submission logs for RaggedTensorToVariant calls with unusual tensor shapes. 5) INVENTORY: Identify all production services pinned to TF <2.5.0 using dependency scanning (pip-audit, Snyk).

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Art. 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2 - AI system risk management
NIST AI RMF
MANAGE-2.2 - Mechanisms to sustain the value of deployed AI systems are evaluated and applied
OWASP LLM Top 10
LLM04:2025 - Model Denial of Service

Frequently Asked Questions

What is CVE-2021-29516?

A crafted empty RaggedTensor argument crashes TensorFlow processes via null pointer dereference (CWE-476), causing availability loss in training or inference workloads. Patch immediately to TF 2.5.0 or apply available backports (2.4.2/2.3.3/2.2.3/2.1.4). Risk is highest in shared ML infrastructure where low-privileged users can submit arbitrary TF ops.

Is CVE-2021-29516 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29516, increasing the risk of exploitation.

How to fix CVE-2021-29516?

1) PATCH: Upgrade to TensorFlow 2.5.0. Backports available for 2.4.2, 2.3.3, 2.2.3, 2.1.4 — apply the commit b055b9c to older pinned deployments. 2) VALIDATE: Add pre-call assertion that ragged tensor splits is non-empty before invoking RaggedTensorToVariant; enforce input validation at pipeline boundaries. 3) ISOLATE: In shared ML infrastructure, run tenant workloads in separate processes or containers to contain blast radius of any crash. 4) DETECT: Alert on unexpected TensorFlow worker process crashes; audit job submission logs for RaggedTensorToVariant calls with unusual tensor shapes. 5) INVENTORY: Identify all production services pinned to TF <2.5.0 using dependency scanning (pip-audit, Snyk).

What systems are affected by CVE-2021-29516?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, shared ML notebooks, MLOps platforms, feature engineering pipelines.

What is the CVSS score for CVE-2021-29516?

CVE-2021-29516 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.20%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingshared ML notebooksMLOps platformsfeature engineering pipelines

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0029 Denial of AI Service
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art. 15
ISO 42001: A.6.2
NIST AI RMF: MANAGE-2.2
OWASP LLM Top 10: LLM04:2025

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. Calling `tf.raw_ops.RaggedTensorToVariant` with arguments specifying an invalid ragged tensor results in a null pointer dereference. The implementation of `RaggedTensorToVariant` operations(https://github.com/tensorflow/tensorflow/blob/904b3926ed1c6c70380d5313d282d248a776baa1/tensorflow/core/kernels/ragged_tensor_to_variant_op.cc#L39-L40) does not validate that the ragged tensor argument is non-empty. Since `batched_ragged` contains no elements, `batched_ragged.splits` is a null vector, thus `batched_ragged.splits(0)` will result in dereferencing `nullptr`. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with low-privileged access to a shared MLOps platform (e.g., internal Kubeflow or SageMaker-style cluster) submits a training job calling tf.raw_ops.RaggedTensorToVariant with an empty ragged tensor (null splits vector). The TF worker process dereferences a null pointer and crashes immediately, disrupting any co-located training jobs on the same worker. The attacker repeats submission in a loop, causing sustained denial of service across the cluster. In a multi-tenant environment, this becomes a low-effort lateral disruption tool — no elevated privileges, no special TF knowledge, just a one-liner crafted op call.

Weaknesses (CWE)

CWE-476 — NULL Pointer Dereference: The product dereferences a pointer that it expects to be valid but is NULL.

  • [Implementation] For any pointers that could have been modified or provided from a function that can return NULL, check the pointer for NULL before use. When working with a multithreaded or otherwise asynchronous environment, ensure that proper locking APIs are used to lock before the check, and unlock when it has finished [REF-1484].
  • [Requirements] Select a programming language that is not susceptible to these issues.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities