CVE-2021-29516: TensorFlow: null ptr deref crashes RaggedTensor ops
MEDIUM PoC AVAILABLEA crafted empty RaggedTensor argument crashes TensorFlow processes via null pointer dereference (CWE-476), causing availability loss in training or inference workloads. Patch immediately to TF 2.5.0 or apply available backports (2.4.2/2.3.3/2.2.3/2.1.4). Risk is highest in shared ML infrastructure where low-privileged users can submit arbitrary TF ops.
Risk Assessment
Medium risk overall, but operationally significant in multi-tenant ML environments. The local attack vector limits remote exposure — however, shared training clusters, MLOps job submission APIs, and hosted notebook platforms (Jupyter, Colab-style) expose this to low-privileged users. No data exfiltration or code execution; impact is confined to process availability. Exploitation is trivial once an attacker can execute TF code, making internal threat actors and compromised low-privilege accounts the primary concern.
Affected Systems
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| tensorflow | pip | — | No patch |
Do you use tensorflow? You're affected.
Severity & Risk
Attack Surface
Recommended Action
1 step-
1) PATCH: Upgrade to TensorFlow 2.5.0. Backports available for 2.4.2, 2.3.3, 2.2.3, 2.1.4 — apply the commit b055b9c to older pinned deployments. 2) VALIDATE: Add pre-call assertion that ragged tensor splits is non-empty before invoking RaggedTensorToVariant; enforce input validation at pipeline boundaries. 3) ISOLATE: In shared ML infrastructure, run tenant workloads in separate processes or containers to contain blast radius of any crash. 4) DETECT: Alert on unexpected TensorFlow worker process crashes; audit job submission logs for RaggedTensorToVariant calls with unusual tensor shapes. 5) INVENTORY: Identify all production services pinned to TF <2.5.0 using dependency scanning (pip-audit, Snyk).
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2021-29516?
A crafted empty RaggedTensor argument crashes TensorFlow processes via null pointer dereference (CWE-476), causing availability loss in training or inference workloads. Patch immediately to TF 2.5.0 or apply available backports (2.4.2/2.3.3/2.2.3/2.1.4). Risk is highest in shared ML infrastructure where low-privileged users can submit arbitrary TF ops.
Is CVE-2021-29516 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2021-29516, increasing the risk of exploitation.
How to fix CVE-2021-29516?
1) PATCH: Upgrade to TensorFlow 2.5.0. Backports available for 2.4.2, 2.3.3, 2.2.3, 2.1.4 — apply the commit b055b9c to older pinned deployments. 2) VALIDATE: Add pre-call assertion that ragged tensor splits is non-empty before invoking RaggedTensorToVariant; enforce input validation at pipeline boundaries. 3) ISOLATE: In shared ML infrastructure, run tenant workloads in separate processes or containers to contain blast radius of any crash. 4) DETECT: Alert on unexpected TensorFlow worker process crashes; audit job submission logs for RaggedTensorToVariant calls with unusual tensor shapes. 5) INVENTORY: Identify all production services pinned to TF <2.5.0 using dependency scanning (pip-audit, Snyk).
What systems are affected by CVE-2021-29516?
This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, shared ML notebooks, MLOps platforms, feature engineering pipelines.
What is the CVSS score for CVE-2021-29516?
CVE-2021-29516 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.01%.
Technical Details
NVD Description
TensorFlow is an end-to-end open source platform for machine learning. Calling `tf.raw_ops.RaggedTensorToVariant` with arguments specifying an invalid ragged tensor results in a null pointer dereference. The implementation of `RaggedTensorToVariant` operations(https://github.com/tensorflow/tensorflow/blob/904b3926ed1c6c70380d5313d282d248a776baa1/tensorflow/core/kernels/ragged_tensor_to_variant_op.cc#L39-L40) does not validate that the ragged tensor argument is non-empty. Since `batched_ragged` contains no elements, `batched_ragged.splits` is a null vector, thus `batched_ragged.splits(0)` will result in dereferencing `nullptr`. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.
Exploitation Scenario
An adversary with low-privileged access to a shared MLOps platform (e.g., internal Kubeflow or SageMaker-style cluster) submits a training job calling tf.raw_ops.RaggedTensorToVariant with an empty ragged tensor (null splits vector). The TF worker process dereferences a null pointer and crashes immediately, disrupting any co-located training jobs on the same worker. The attacker repeats submission in a loop, causing sustained denial of service across the cluster. In a multi-tenant environment, this becomes a low-effort lateral disruption tool — no elevated privileges, no special TF knowledge, just a one-liner crafted op call.
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H References
- github.com/tensorflow/tensorflow/commit/b055b9c474cd376259dde8779908f9eeaf097d93 Patch 3rd Party
- github.com/tensorflow/tensorflow/security/advisories/GHSA-84mw-34w6-2q43 Exploit Patch 3rd Party
- github.com/ARPSyndicate/cvemon Exploit
Timeline
Related Vulnerabilities
CVE-2020-15196 9.9 TensorFlow: heap OOB read in sparse/ragged count ops
Same package: tensorflow CVE-2020-15205 9.8 TensorFlow: heap overflow in StringNGrams, ASLR bypass
Same package: tensorflow CVE-2020-15208 9.8 TFLite: OOB read/write via tensor dimension mismatch
Same package: tensorflow CVE-2019-16778 9.8 TensorFlow: heap overflow in UnsortedSegmentSum op
Same package: tensorflow CVE-2022-23587 9.8 TensorFlow: integer overflow in Grappler enables RCE
Same package: tensorflow
AI Threat Alert