CVE-2021-37648: TensorFlow SaveV2: null ptr deref, local crash/RCE
HIGHA validation bypass in TensorFlow's SaveV2 kernel allows any local user to trigger a null pointer dereference, crashing training processes or potentially escalating to code execution. Shared ML compute environments—Kubeflow, JupyterHub, on-prem GPU clusters—are the primary exposure surface. Patch to TF 2.5.1, 2.4.3, 2.3.4, or 2.6.0 immediately; the fix is a single cherry-picked commit.
Risk Assessment
CVSS 7.8 High with local vector, low complexity, and low privilege requirement. The root cause—OP_REQUIRES silently setting an error status and returning from the validation helper while execution continues in the parent Compute function—means the validation is completely bypassed with no runtime signal or log entry. In single-tenant environments risk is moderate; in shared ML clusters (multi-user notebook servers, distributed training farms) where low-privilege users can submit arbitrary ops, the attack surface is significantly broader and exploitability approaches trivial.
Affected Systems
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| tensorflow | pip | — | No patch |
Do you use tensorflow? You're affected.
Severity & Risk
Attack Surface
Recommended Action
1 step-
1) Upgrade to TF 2.6.0, 2.5.1, 2.4.3, or 2.3.4 (commit 9728c60e). 2) If patching is blocked, containerize training workloads in isolated single-user environments to eliminate the lateral movement path. 3) Restrict execution of raw TF ops in shared notebook environments via OPA or admission controllers. 4) Monitor for unexpected process crashes or null-deref signals in training job logs—add alerting on SIGSEGV from TF worker processes. 5) Audit internal model training infrastructure for pinned TF versions and update dependency lock files in CI/CD pipelines.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2021-37648?
A validation bypass in TensorFlow's SaveV2 kernel allows any local user to trigger a null pointer dereference, crashing training processes or potentially escalating to code execution. Shared ML compute environments—Kubeflow, JupyterHub, on-prem GPU clusters—are the primary exposure surface. Patch to TF 2.5.1, 2.4.3, 2.3.4, or 2.6.0 immediately; the fix is a single cherry-picked commit.
Is CVE-2021-37648 actively exploited?
No confirmed active exploitation of CVE-2021-37648 has been reported, but organizations should still patch proactively.
How to fix CVE-2021-37648?
1) Upgrade to TF 2.6.0, 2.5.1, 2.4.3, or 2.3.4 (commit 9728c60e). 2) If patching is blocked, containerize training workloads in isolated single-user environments to eliminate the lateral movement path. 3) Restrict execution of raw TF ops in shared notebook environments via OPA or admission controllers. 4) Monitor for unexpected process crashes or null-deref signals in training job logs—add alerting on SIGSEGV from TF worker processes. 5) Audit internal model training infrastructure for pinned TF versions and update dependency lock files in CI/CD pipelines.
What systems are affected by CVE-2021-37648?
This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, model registry.
What is the CVSS score for CVE-2021-37648?
CVE-2021-37648 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.03%.
Technical Details
NVD Description
TensorFlow is an end-to-end open source platform for machine learning. In affected versions the code for `tf.raw_ops.SaveV2` does not properly validate the inputs and an attacker can trigger a null pointer dereference. The [implementation](https://github.com/tensorflow/tensorflow/blob/8d72537c6abf5a44103b57b9c2e22c14f5f49698/tensorflow/core/kernels/save_restore_v2_ops.cc) uses `ValidateInputs` to check that the input arguments are valid. This validation would have caught the illegal state represented by the reproducer above. However, the validation uses `OP_REQUIRES` which translates to setting the `Status` object of the current `OpKernelContext` to an error status, followed by an empty `return` statement which just terminates the execution of the function it is present in. However, this does not mean that the kernel execution is finalized: instead, execution continues from the next line in `Compute` that follows the call to `ValidateInputs`. This is equivalent to lacking the validation. We have patched the issue in GitHub commit 9728c60e136912a12d99ca56e106b7cce7af5986. The fix will be included in TensorFlow 2.6.0. We will also cherrypick this commit on TensorFlow 2.5.1, TensorFlow 2.4.3, and TensorFlow 2.3.4, as these are also affected and still in supported range.
Exploitation Scenario
An attacker with low-privilege shell or notebook access on a shared Kubeflow cluster crafts a session that calls tf.raw_ops.SaveV2 with intentionally malformed tensor shape arguments. ValidateInputs sets an error on the OpKernelContext and returns, but Compute continues executing past the call and dereferences a null pointer. This reliably crashes the training pod (DoS, disrupting active training runs) and, with controlled heap grooming, can be escalated to code execution under the training process's service account—enabling exfiltration of model checkpoints, training data, or cloud provider credentials stored in the pod's environment.
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H References
Timeline
Related Vulnerabilities
CVE-2020-15196 9.9 TensorFlow: heap OOB read in sparse/ragged count ops
Same package: tensorflow CVE-2020-15205 9.8 TensorFlow: heap overflow in StringNGrams, ASLR bypass
Same package: tensorflow CVE-2020-15208 9.8 TFLite: OOB read/write via tensor dimension mismatch
Same package: tensorflow CVE-2019-16778 9.8 TensorFlow: heap overflow in UnsortedSegmentSum op
Same package: tensorflow CVE-2022-23587 9.8 TensorFlow: integer overflow in Grappler enables RCE
Same package: tensorflow
AI Threat Alert