CVE-2021-29516: TensorFlow: null ptr deref crashes RaggedTensor ops
MEDIUM PoC AVAILABLEA crafted empty RaggedTensor argument crashes TensorFlow processes via null pointer dereference (CWE-476), causing availability loss in training or inference workloads. Patch immediately to TF 2.5.0 or apply available backports (2.4.2/2.3.3/2.2.3/2.1.4). Risk is highest in shared ML infrastructure where low-privileged users can submit arbitrary TF ops.
What is the risk?
Medium risk overall, but operationally significant in multi-tenant ML environments. The local attack vector limits remote exposure — however, shared training clusters, MLOps job submission APIs, and hosted notebook platforms (Jupyter, Colab-style) expose this to low-privileged users. No data exfiltration or code execution; impact is confined to process availability. Exploitation is trivial once an attacker can execute TF code, making internal threat actors and compromised low-privilege accounts the primary concern.
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| TensorFlow | pip | — | No patch |
Do you use TensorFlow? You're affected.
How severe is it?
What is the attack surface?
What should I do?
1 step-
1) PATCH: Upgrade to TensorFlow 2.5.0. Backports available for 2.4.2, 2.3.3, 2.2.3, 2.1.4 — apply the commit b055b9c to older pinned deployments. 2) VALIDATE: Add pre-call assertion that ragged tensor splits is non-empty before invoking RaggedTensorToVariant; enforce input validation at pipeline boundaries. 3) ISOLATE: In shared ML infrastructure, run tenant workloads in separate processes or containers to contain blast radius of any crash. 4) DETECT: Alert on unexpected TensorFlow worker process crashes; audit job submission logs for RaggedTensorToVariant calls with unusual tensor shapes. 5) INVENTORY: Identify all production services pinned to TF <2.5.0 using dependency scanning (pip-audit, Snyk).
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2021-29516?
A crafted empty RaggedTensor argument crashes TensorFlow processes via null pointer dereference (CWE-476), causing availability loss in training or inference workloads. Patch immediately to TF 2.5.0 or apply available backports (2.4.2/2.3.3/2.2.3/2.1.4). Risk is highest in shared ML infrastructure where low-privileged users can submit arbitrary TF ops.
Is CVE-2021-29516 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2021-29516, increasing the risk of exploitation.
How to fix CVE-2021-29516?
1) PATCH: Upgrade to TensorFlow 2.5.0. Backports available for 2.4.2, 2.3.3, 2.2.3, 2.1.4 — apply the commit b055b9c to older pinned deployments. 2) VALIDATE: Add pre-call assertion that ragged tensor splits is non-empty before invoking RaggedTensorToVariant; enforce input validation at pipeline boundaries. 3) ISOLATE: In shared ML infrastructure, run tenant workloads in separate processes or containers to contain blast radius of any crash. 4) DETECT: Alert on unexpected TensorFlow worker process crashes; audit job submission logs for RaggedTensorToVariant calls with unusual tensor shapes. 5) INVENTORY: Identify all production services pinned to TF <2.5.0 using dependency scanning (pip-audit, Snyk).
What systems are affected by CVE-2021-29516?
This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, shared ML notebooks, MLOps platforms, feature engineering pipelines.
What is the CVSS score for CVE-2021-29516?
CVE-2021-29516 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.20%.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0010.001 AI Software AML.T0029 Denial of AI Service AML.T0049 Exploit Public-Facing Application Compliance Controls Affected
What are the technical details?
Original Advisory
TensorFlow is an end-to-end open source platform for machine learning. Calling `tf.raw_ops.RaggedTensorToVariant` with arguments specifying an invalid ragged tensor results in a null pointer dereference. The implementation of `RaggedTensorToVariant` operations(https://github.com/tensorflow/tensorflow/blob/904b3926ed1c6c70380d5313d282d248a776baa1/tensorflow/core/kernels/ragged_tensor_to_variant_op.cc#L39-L40) does not validate that the ragged tensor argument is non-empty. Since `batched_ragged` contains no elements, `batched_ragged.splits` is a null vector, thus `batched_ragged.splits(0)` will result in dereferencing `nullptr`. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.
Exploitation Scenario
An adversary with low-privileged access to a shared MLOps platform (e.g., internal Kubeflow or SageMaker-style cluster) submits a training job calling tf.raw_ops.RaggedTensorToVariant with an empty ragged tensor (null splits vector). The TF worker process dereferences a null pointer and crashes immediately, disrupting any co-located training jobs on the same worker. The attacker repeats submission in a loop, causing sustained denial of service across the cluster. In a multi-tenant environment, this becomes a low-effort lateral disruption tool — no elevated privileges, no special TF knowledge, just a one-liner crafted op call.
Weaknesses (CWE)
CWE-476 — NULL Pointer Dereference: The product dereferences a pointer that it expects to be valid but is NULL.
- [Implementation] For any pointers that could have been modified or provided from a function that can return NULL, check the pointer for NULL before use. When working with a multithreaded or otherwise asynchronous environment, ensure that proper locking APIs are used to lock before the check, and unlock when it has finished [REF-1484].
- [Requirements] Select a programming language that is not susceptible to these issues.
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H References
- github.com/tensorflow/tensorflow/commit/b055b9c474cd376259dde8779908f9eeaf097d93 Patch 3rd Party
- github.com/tensorflow/tensorflow/security/advisories/GHSA-84mw-34w6-2q43 Exploit Patch 3rd Party
- github.com/ARPSyndicate/cvemon Exploit
Timeline
Related Vulnerabilities
CVE-2020-15196 9.9 TensorFlow: heap OOB read in sparse/ragged count ops
Same package: tensorflow CVE-2020-15205 9.8 TensorFlow: heap overflow in StringNGrams, ASLR bypass
Same package: tensorflow CVE-2020-15208 9.8 TFLite: OOB read/write via tensor dimension mismatch
Same package: tensorflow CVE-2019-16778 9.8 TensorFlow: heap overflow in UnsortedSegmentSum op
Same package: tensorflow CVE-2022-23587 9.8 TensorFlow: integer overflow in Grappler enables RCE
Same package: tensorflow