CVE-2021-37638: TensorFlow: null ptr deref in RaggedTensorToTensor op

HIGH
Published August 12, 2021
CISO Take

Any TensorFlow deployment (2.3.x–2.5.x) accepting user-controlled tensor inputs is exposed to process crash or potential code execution via a malformed RaggedTensor API call. Patch immediately to TF 2.6.0, 2.5.1, 2.4.3, or 2.3.4. In shared ML environments (JupyterHub, model serving APIs), treat this as high-priority since low-privilege local access is all that's needed.

Risk Assessment

CVSS 7.8 with local attack vector and low privilege requirement makes this practically exploitable in any multi-tenant ML infrastructure — shared training clusters, Jupyter servers, or internal model serving endpoints. The undefined behavior from CWE-476 can escalate beyond denial-of-service into memory corruption and potential RCE, making the effective risk higher than a simple crash. No active exploitation evidence, but exploit complexity is trivial once the access threshold is met.

Affected Systems

Package Ecosystem Vulnerable Range Patched
tensorflow pip No patch
195.0K OpenSSF 7.2 3.7K dependents Pushed 6d ago 4% patched ~1372d to patch Full package profile →

Do you use tensorflow? You're affected.

Severity & Risk

CVSS 3.1
7.8 / 10
EPSS
0.0%
chance of exploitation in 30 days
Higher than 2% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Trivial

Attack Surface

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C High
I High
A High

Recommended Action

5 steps
  1. Patch

    Upgrade to TensorFlow 2.6.0, 2.5.1, 2.4.3, or 2.3.4 (commit 301ae88b).

  2. Input validation

    Add server-side validation to reject empty row_partition_types before passing to tf.raw_ops.RaggedTensorToTensor.

  3. Restrict API surface

    If using TF Serving or custom endpoints, disable or restrict access to raw tf.raw_ops calls from untrusted callers.

  4. Isolate

    Run model serving and training workloads in sandboxed containers with minimal privileges to limit blast radius.

  5. Detect

    Monitor for unexpected TensorFlow process crashes (OOMKilled, segfaults) in your ML infrastructure as indicator of exploitation attempts.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Art.15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2.6 - Cybersecurity of AI systems
NIST AI RMF
MANAGE-2.2 - Mechanisms are in place to respond to and recover from vulnerabilities
OWASP LLM Top 10
LLM06 - Sensitive Information Disclosure

Frequently Asked Questions

What is CVE-2021-37638?

Any TensorFlow deployment (2.3.x–2.5.x) accepting user-controlled tensor inputs is exposed to process crash or potential code execution via a malformed RaggedTensor API call. Patch immediately to TF 2.6.0, 2.5.1, 2.4.3, or 2.3.4. In shared ML environments (JupyterHub, model serving APIs), treat this as high-priority since low-privilege local access is all that's needed.

Is CVE-2021-37638 actively exploited?

No confirmed active exploitation of CVE-2021-37638 has been reported, but organizations should still patch proactively.

How to fix CVE-2021-37638?

1. **Patch**: Upgrade to TensorFlow 2.6.0, 2.5.1, 2.4.3, or 2.3.4 (commit 301ae88b). 2. **Input validation**: Add server-side validation to reject empty row_partition_types before passing to tf.raw_ops.RaggedTensorToTensor. 3. **Restrict API surface**: If using TF Serving or custom endpoints, disable or restrict access to raw tf.raw_ops calls from untrusted callers. 4. **Isolate**: Run model serving and training workloads in sandboxed containers with minimal privileges to limit blast radius. 5. **Detect**: Monitor for unexpected TensorFlow process crashes (OOMKilled, segfaults) in your ML infrastructure as indicator of exploitation attempts.

What systems are affected by CVE-2021-37638?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, inference APIs, shared ML notebooks.

What is the CVSS score for CVE-2021-37638?

CVE-2021-37638 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.01%.

Technical Details

NVD Description

TensorFlow is an end-to-end open source platform for machine learning. Sending invalid argument for `row_partition_types` of `tf.raw_ops.RaggedTensorToTensor` API results in a null pointer dereference and undefined behavior. The [implementation](https://github.com/tensorflow/tensorflow/blob/47a06f40411a69c99f381495f490536972152ac0/tensorflow/core/kernels/ragged_tensor_to_tensor_op.cc#L328) accesses the first element of a user supplied list of values without validating that the provided list is not empty. We have patched the issue in GitHub commit 301ae88b331d37a2a16159b65b255f4f9eb39314. The fix will be included in TensorFlow 2.6.0. We will also cherrypick this commit on TensorFlow 2.5.1, TensorFlow 2.4.3, and TensorFlow 2.3.4, as these are also affected and still in supported range.

Exploitation Scenario

An attacker with low-privilege access to a shared ML training cluster or JupyterHub server submits a crafted notebook or script that calls `tf.raw_ops.RaggedTensorToTensor(shape=..., values=..., default_value=..., row_partition_tensors=..., row_partition_types=[])` with an empty list for row_partition_types. This triggers a null pointer dereference in the kernel implementation at ragged_tensor_to_tensor_op.cc:328, crashing the TensorFlow process. In environments without memory protection or with misconfigured heap allocators, the undefined behavior may be leveraged for memory corruption and escalation to code execution — allowing the attacker to pivot within the ML infrastructure or exfiltrate model weights.

Weaknesses (CWE)

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
August 12, 2021
Last Modified
November 21, 2024
First Seen
August 12, 2021

Related Vulnerabilities