CVE-2021-37649: TensorFlow: null ptr deref crashes inference via bad tensor

MEDIUM
Published August 12, 2021
CISO Take

A local attacker with low privileges can crash any TensorFlow process by passing a malformed Variant tensor to tf.raw_ops.UncompressElement, triggering a null pointer dereference. Impact is limited to availability (process crash/DoS), but shared inference servers or multi-tenant ML platforms amplify blast radius. Patch immediately to TF 2.6.0, 2.5.1, 2.4.3, or 2.3.4.

What is the risk?

Medium risk overall, but contextually elevated in production inference environments. CVSS 5.5 reflects local-only attack vector and low-privilege requirement, which limits opportunistic exploitation. However, in containerized or shared ML serving infrastructure, a single malicious or misconfigured client job could repeatedly crash the TensorFlow runtime, degrading service availability. No active exploitation or public PoC weaponization reported. Not in CISA KEV.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 2d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
5.5 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 5% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Trivial

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

What should I do?

4 steps
  1. Patch: Upgrade to TensorFlow 2.6.0, 2.5.1, 2.4.3, or 2.3.4 — all contain commit 7bdf50bb4f5c54a4997c379092888546c97c3ebd.

  2. Workaround (if patching is blocked): Audit and restrict code paths invoking tf.raw_ops.UncompressElement; add input validation to verify Variant tensors contain a valid CompressedElement before decompression.

  3. Detection: Monitor TensorFlow process crashes and OOM/segfault signals in ML serving infrastructure; alert on unexpected restarts of model server pods.

  4. Containment: Isolate TF runtime processes per user/job in multi-tenant environments to limit blast radius of a crash.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, Robustness and Cybersecurity Article 9 - Risk Management System
ISO 42001
A.6.2 - AI System Robustness and Availability
NIST AI RMF
MANAGE 2.2 - Mechanisms to sustain AI system reliability

Frequently Asked Questions

What is CVE-2021-37649?

A local attacker with low privileges can crash any TensorFlow process by passing a malformed Variant tensor to tf.raw_ops.UncompressElement, triggering a null pointer dereference. Impact is limited to availability (process crash/DoS), but shared inference servers or multi-tenant ML platforms amplify blast radius. Patch immediately to TF 2.6.0, 2.5.1, 2.4.3, or 2.3.4.

Is CVE-2021-37649 actively exploited?

No confirmed active exploitation of CVE-2021-37649 has been reported, but organizations should still patch proactively.

How to fix CVE-2021-37649?

1. Patch: Upgrade to TensorFlow 2.6.0, 2.5.1, 2.4.3, or 2.3.4 — all contain commit 7bdf50bb4f5c54a4997c379092888546c97c3ebd. 2. Workaround (if patching is blocked): Audit and restrict code paths invoking tf.raw_ops.UncompressElement; add input validation to verify Variant tensors contain a valid CompressedElement before decompression. 3. Detection: Monitor TensorFlow process crashes and OOM/segfault signals in ML serving infrastructure; alert on unexpected restarts of model server pods. 4. Containment: Isolate TF runtime processes per user/job in multi-tenant environments to limit blast radius of a crash.

What systems are affected by CVE-2021-37649?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, data preprocessing.

What is the CVSS score for CVE-2021-37649?

CVE-2021-37649 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.16%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingdata preprocessing

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0029 Denial of AI Service
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15, Article 9
ISO 42001: A.6.2
NIST AI RMF: MANAGE 2.2

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. The code for `tf.raw_ops.UncompressElement` can be made to trigger a null pointer dereference. The [implementation](https://github.com/tensorflow/tensorflow/blob/f24faa153ad31a4b51578f8181d3aaab77a1ddeb/tensorflow/core/kernels/data/experimental/compression_ops.cc#L50-L53) obtains a pointer to a `CompressedElement` from a `Variant` tensor and then proceeds to dereference it for decompressing. There is no check that the `Variant` tensor contained a `CompressedElement`, so the pointer is actually `nullptr`. We have patched the issue in GitHub commit 7bdf50bb4f5c54a4997c379092888546c97c3ebd. The fix will be included in TensorFlow 2.6.0. We will also cherrypick this commit on TensorFlow 2.5.1, TensorFlow 2.4.3, and TensorFlow 2.3.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with access to a shared ML inference server (e.g., a data scientist on a multi-tenant Kubeflow cluster) submits a TensorFlow job that calls tf.raw_ops.UncompressElement with a Variant tensor that does not contain a CompressedElement. The TF runtime dereferences a null pointer, causing the process to crash. In a Kubernetes environment this triggers an automatic restart, but repeated crashes constitute a DoS against the serving infrastructure, potentially delaying model inference for all tenants. No special tooling required — a single malicious Python snippet is sufficient.

Weaknesses (CWE)

CWE-476 — NULL Pointer Dereference: The product dereferences a pointer that it expects to be valid but is NULL.

  • [Implementation] For any pointers that could have been modified or provided from a function that can return NULL, check the pointer for NULL before use. When working with a multithreaded or otherwise asynchronous environment, ensure that proper locking APIs are used to lock before the check, and unlock when it has finished [REF-1484].
  • [Requirements] Select a programming language that is not susceptible to these issues.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
August 12, 2021
Last Modified
November 21, 2024
First Seen
August 12, 2021

Related Vulnerabilities