CVE-2021-37648: TensorFlow null ptr deref, local

CISO Take

A validation bypass in TensorFlow's SaveV2 kernel allows any local user to trigger a null pointer dereference, crashing training processes or potentially escalating to code execution. Shared ML compute environments—Kubeflow, JupyterHub, on-prem GPU clusters—are the primary exposure surface. Patch to TF 2.5.1, 2.4.3, 2.3.4, or 2.6.0 immediately; the fix is a single cherry-picked commit.

What is the risk?

CVSS 7.8 High with local vector, low complexity, and low privilege requirement. The root cause—OP_REQUIRES silently setting an error status and returning from the validation helper while execution continues in the parent Compute function—means the validation is completely bypassed with no runtime signal or log entry. In single-tenant environments risk is moderate; in shared ML clusters (multi-user notebook servers, distributed training farms) where low-privilege users can submit arbitrary ops, the attack surface is significantly broader and exploitability approaches trivial.

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
TensorFlow	pip	—	No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1

7.8 / 10

EPSS

0.2%

chance of exploitation in 30 days

Higher than 8% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

No known exploitation

Sophistication

Trivial

What is the attack surface?

AV Local

AC Low

PR Low

UI None

S Unchanged

C High

I High

A High

What should I do?

1 step

1) Upgrade to TF 2.6.0, 2.5.1, 2.4.3, or 2.3.4 (commit 9728c60e). 2) If patching is blocked, containerize training workloads in isolated single-user environments to eliminate the lateral movement path. 3) Restrict execution of raw TF ops in shared notebook environments via OPA or admission controllers. 4) Monitor for unexpected process crashes or null-deref signals in training job logs—add alerting on SIGSEGV from TF worker processes. 5) Audit internal model training infrastructure for pinned TF versions and update dependency lock files in CI/CD pipelines.

How is it classified?

Code Execution DoS Framework Training Data AML.T0010.001 - AI Software AML.T0035 - AI Artifact Collection AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Art. 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.6.2.6 - AI system operation and monitoring

NIST AI RMF

MANAGE 2.2 - Mechanisms to respond to and recover from newly identified AI risks

OWASP LLM Top 10

LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-37648?

A validation bypass in TensorFlow's SaveV2 kernel allows any local user to trigger a null pointer dereference, crashing training processes or potentially escalating to code execution. Shared ML compute environments—Kubeflow, JupyterHub, on-prem GPU clusters—are the primary exposure surface. Patch to TF 2.5.1, 2.4.3, 2.3.4, or 2.6.0 immediately; the fix is a single cherry-picked commit.

Is CVE-2021-37648 actively exploited?

No confirmed active exploitation of CVE-2021-37648 has been reported, but organizations should still patch proactively.

How to fix CVE-2021-37648?

1) Upgrade to TF 2.6.0, 2.5.1, 2.4.3, or 2.3.4 (commit 9728c60e). 2) If patching is blocked, containerize training workloads in isolated single-user environments to eliminate the lateral movement path. 3) Restrict execution of raw TF ops in shared notebook environments via OPA or admission controllers. 4) Monitor for unexpected process crashes or null-deref signals in training job logs—add alerting on SIGSEGV from TF worker processes. 5) Audit internal model training infrastructure for pinned TF versions and update dependency lock files in CI/CD pipelines.

What systems are affected by CVE-2021-37648?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, model registry.

What is the CVSS score for CVE-2021-37648?

CVE-2021-37648 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.19%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingmodel registry

MITRE ATLAS Techniques

AML.T0010.001 AI Software

AML.T0035 AI Artifact Collection

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art. 15

ISO 42001: A.6.2.6

NIST AI RMF: MANAGE 2.2

OWASP LLM Top 10: LLM05

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. In affected versions the code for `tf.raw_ops.SaveV2` does not properly validate the inputs and an attacker can trigger a null pointer dereference. The [implementation](https://github.com/tensorflow/tensorflow/blob/8d72537c6abf5a44103b57b9c2e22c14f5f49698/tensorflow/core/kernels/save_restore_v2_ops.cc) uses `ValidateInputs` to check that the input arguments are valid. This validation would have caught the illegal state represented by the reproducer above. However, the validation uses `OP_REQUIRES` which translates to setting the `Status` object of the current `OpKernelContext` to an error status, followed by an empty `return` statement which just terminates the execution of the function it is present in. However, this does not mean that the kernel execution is finalized: instead, execution continues from the next line in `Compute` that follows the call to `ValidateInputs`. This is equivalent to lacking the validation. We have patched the issue in GitHub commit 9728c60e136912a12d99ca56e106b7cce7af5986. The fix will be included in TensorFlow 2.6.0. We will also cherrypick this commit on TensorFlow 2.5.1, TensorFlow 2.4.3, and TensorFlow 2.3.4, as these are also affected and still in supported range.

Exploitation Scenario

An attacker with low-privilege shell or notebook access on a shared Kubeflow cluster crafts a session that calls tf.raw_ops.SaveV2 with intentionally malformed tensor shape arguments. ValidateInputs sets an error on the OpKernelContext and returns, but Compute continues executing past the call and dereferences a null pointer. This reliably crashes the training pod (DoS, disrupting active training runs) and, with controlled heap grooming, can be escalated to code execution under the training process's service account—enabling exfiltration of model checkpoints, training data, or cloud provider credentials stored in the pod's environment.

Weaknesses (CWE)

CWE-476 NULL Pointer Dereference

CWE-476 — NULL Pointer Dereference: The product dereferences a pointer that it expects to be valid but is NULL.

[Implementation] For any pointers that could have been modified or provided from a function that can return NULL, check the pointer for NULL before use. When working with a multithreaded or otherwise asynchronous environment, ensure that proper locking APIs are used to lock before the check, and unlock when it has finished [REF-1484].
[Requirements] Select a programming language that is not susceptible to these issues.

Source: MITRE CWE corpus.