CVE-2021-37648: TensorFlow SaveV2: null ptr deref, local crash/RCE

HIGH
Published August 12, 2021
CISO Take

A validation bypass in TensorFlow's SaveV2 kernel allows any local user to trigger a null pointer dereference, crashing training processes or potentially escalating to code execution. Shared ML compute environments—Kubeflow, JupyterHub, on-prem GPU clusters—are the primary exposure surface. Patch to TF 2.5.1, 2.4.3, 2.3.4, or 2.6.0 immediately; the fix is a single cherry-picked commit.

Risk Assessment

CVSS 7.8 High with local vector, low complexity, and low privilege requirement. The root cause—OP_REQUIRES silently setting an error status and returning from the validation helper while execution continues in the parent Compute function—means the validation is completely bypassed with no runtime signal or log entry. In single-tenant environments risk is moderate; in shared ML clusters (multi-user notebook servers, distributed training farms) where low-privilege users can submit arbitrary ops, the attack surface is significantly broader and exploitability approaches trivial.

Affected Systems

Package Ecosystem Vulnerable Range Patched
tensorflow pip No patch
195.0K OpenSSF 7.2 3.7K dependents Pushed 6d ago 4% patched ~1372d to patch Full package profile →

Do you use tensorflow? You're affected.

Severity & Risk

CVSS 3.1
7.8 / 10
EPSS
0.0%
chance of exploitation in 30 days
Higher than 9% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Trivial

Attack Surface

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C High
I High
A High

Recommended Action

1 step
  1. 1) Upgrade to TF 2.6.0, 2.5.1, 2.4.3, or 2.3.4 (commit 9728c60e). 2) If patching is blocked, containerize training workloads in isolated single-user environments to eliminate the lateral movement path. 3) Restrict execution of raw TF ops in shared notebook environments via OPA or admission controllers. 4) Monitor for unexpected process crashes or null-deref signals in training job logs—add alerting on SIGSEGV from TF worker processes. 5) Audit internal model training infrastructure for pinned TF versions and update dependency lock files in CI/CD pipelines.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Art. 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2.6 - AI system operation and monitoring
NIST AI RMF
MANAGE 2.2 - Mechanisms to respond to and recover from newly identified AI risks
OWASP LLM Top 10
LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-37648?

A validation bypass in TensorFlow's SaveV2 kernel allows any local user to trigger a null pointer dereference, crashing training processes or potentially escalating to code execution. Shared ML compute environments—Kubeflow, JupyterHub, on-prem GPU clusters—are the primary exposure surface. Patch to TF 2.5.1, 2.4.3, 2.3.4, or 2.6.0 immediately; the fix is a single cherry-picked commit.

Is CVE-2021-37648 actively exploited?

No confirmed active exploitation of CVE-2021-37648 has been reported, but organizations should still patch proactively.

How to fix CVE-2021-37648?

1) Upgrade to TF 2.6.0, 2.5.1, 2.4.3, or 2.3.4 (commit 9728c60e). 2) If patching is blocked, containerize training workloads in isolated single-user environments to eliminate the lateral movement path. 3) Restrict execution of raw TF ops in shared notebook environments via OPA or admission controllers. 4) Monitor for unexpected process crashes or null-deref signals in training job logs—add alerting on SIGSEGV from TF worker processes. 5) Audit internal model training infrastructure for pinned TF versions and update dependency lock files in CI/CD pipelines.

What systems are affected by CVE-2021-37648?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, model registry.

What is the CVSS score for CVE-2021-37648?

CVE-2021-37648 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.03%.

Technical Details

NVD Description

TensorFlow is an end-to-end open source platform for machine learning. In affected versions the code for `tf.raw_ops.SaveV2` does not properly validate the inputs and an attacker can trigger a null pointer dereference. The [implementation](https://github.com/tensorflow/tensorflow/blob/8d72537c6abf5a44103b57b9c2e22c14f5f49698/tensorflow/core/kernels/save_restore_v2_ops.cc) uses `ValidateInputs` to check that the input arguments are valid. This validation would have caught the illegal state represented by the reproducer above. However, the validation uses `OP_REQUIRES` which translates to setting the `Status` object of the current `OpKernelContext` to an error status, followed by an empty `return` statement which just terminates the execution of the function it is present in. However, this does not mean that the kernel execution is finalized: instead, execution continues from the next line in `Compute` that follows the call to `ValidateInputs`. This is equivalent to lacking the validation. We have patched the issue in GitHub commit 9728c60e136912a12d99ca56e106b7cce7af5986. The fix will be included in TensorFlow 2.6.0. We will also cherrypick this commit on TensorFlow 2.5.1, TensorFlow 2.4.3, and TensorFlow 2.3.4, as these are also affected and still in supported range.

Exploitation Scenario

An attacker with low-privilege shell or notebook access on a shared Kubeflow cluster crafts a session that calls tf.raw_ops.SaveV2 with intentionally malformed tensor shape arguments. ValidateInputs sets an error on the OpKernelContext and returns, but Compute continues executing past the call and dereferences a null pointer. This reliably crashes the training pod (DoS, disrupting active training runs) and, with controlled heap grooming, can be escalated to code execution under the training process's service account—enabling exfiltration of model checkpoints, training data, or cloud provider credentials stored in the pod's environment.

Weaknesses (CWE)

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
August 12, 2021
Last Modified
November 21, 2024
First Seen
August 12, 2021

Related Vulnerabilities