CVE-2021-37639: TensorFlow: heap OOB read via tensor restore API

HIGH
Published August 12, 2021
CISO Take

A low-privilege local attacker can crash TensorFlow processes or read arbitrary heap memory by supplying malformed tensor names during checkpoint restoration. Shared ML training infrastructure—GPU clusters, Jupyter hubs, multi-tenant cloud notebooks—is the primary risk surface where cross-user data leakage is plausible. Patch all TensorFlow deployments to 2.6.0 or the backported 2.5.1/2.4.3/2.3.4 releases; no workaround substitutes for the patch.

What is the risk?

CVSS 7.8 High with local attack vector and low privilege requirement. Risk is elevated in multi-tenant ML infrastructure where untrusted or compromised users can invoke TF operations. Beyond simple DoS, the heap OOB read (CWE-125) poses a data leakage risk—heap memory may contain model weights, training samples, or in-process credentials. Not in CISA KEV and no evidence of active exploitation, but low attack complexity (AC:L, PR:L) makes weaponization accessible to motivated insiders.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 2d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
7.8 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 7% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Moderate

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C High
I High
A High

What should I do?

5 steps
  1. Upgrade TensorFlow to 2.6.0 or apply backports: 2.5.1, 2.4.3, or 2.3.4 (all contain patch commit 9e82dce6e6bd1f36a57e08fa85af213e2b2f2622).

  2. Audit any code invoking raw save/restore tensor APIs—validate tensor_name is non-empty and preferred_shard is within bounds before calling.

  3. Restrict unprivileged users from direct access to raw TF checkpoint/session APIs in shared environments.

  4. Monitor ML infrastructure for unexpected process segfaults or OOB errors in application logs as exploitation indicators.

  5. In shared training clusters, enforce process isolation (namespaces, separate nodes) between untrusted workloads.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
6.1.2 - AI risk treatment
NIST AI RMF
MANAGE-2.2 - Mechanisms to sustain the value of deployed AI systems
OWASP LLM Top 10
LLM06:2025 - Sensitive Information Disclosure

Frequently Asked Questions

What is CVE-2021-37639?

A low-privilege local attacker can crash TensorFlow processes or read arbitrary heap memory by supplying malformed tensor names during checkpoint restoration. Shared ML training infrastructure—GPU clusters, Jupyter hubs, multi-tenant cloud notebooks—is the primary risk surface where cross-user data leakage is plausible. Patch all TensorFlow deployments to 2.6.0 or the backported 2.5.1/2.4.3/2.3.4 releases; no workaround substitutes for the patch.

Is CVE-2021-37639 actively exploited?

No confirmed active exploitation of CVE-2021-37639 has been reported, but organizations should still patch proactively.

How to fix CVE-2021-37639?

1. Upgrade TensorFlow to 2.6.0 or apply backports: 2.5.1, 2.4.3, or 2.3.4 (all contain patch commit 9e82dce6e6bd1f36a57e08fa85af213e2b2f2622). 2. Audit any code invoking raw save/restore tensor APIs—validate tensor_name is non-empty and preferred_shard is within bounds before calling. 3. Restrict unprivileged users from direct access to raw TF checkpoint/session APIs in shared environments. 4. Monitor ML infrastructure for unexpected process segfaults or OOB errors in application logs as exploitation indicators. 5. In shared training clusters, enforce process isolation (namespaces, separate nodes) between untrusted workloads.

What systems are affected by CVE-2021-37639?

This vulnerability affects the following AI/ML architecture patterns: Training pipelines, Model checkpointing workflows, Shared ML training infrastructure, Model serving with checkpoint loading, ML development environments (Jupyter, notebooks).

What is the CVSS score for CVE-2021-37639?

CVE-2021-37639 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.17%.

What is the AI security impact?

Affected AI Architectures

Training pipelinesModel checkpointing workflowsShared ML training infrastructureModel serving with checkpoint loadingML development environments (Jupyter, notebooks)

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0035 AI Artifact Collection
AML.T0037 Data from Local System

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: 6.1.2
NIST AI RMF: MANAGE-2.2
OWASP LLM Top 10: LLM06:2025

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. When restoring tensors via raw APIs, if the tensor name is not provided, TensorFlow can be tricked into dereferencing a null pointer. Alternatively, attackers can read memory outside the bounds of heap allocated data by providing some tensor names but not enough for a successful restoration. The [implementation](https://github.com/tensorflow/tensorflow/blob/47a06f40411a69c99f381495f490536972152ac0/tensorflow/core/kernels/save_restore_tensor.cc#L158-L159) retrieves the tensor list corresponding to the `tensor_name` user controlled input and immediately retrieves the tensor at the restoration index (controlled via `preferred_shard` argument). This occurs without validating that the provided list has enough values. If the list is empty this results in dereferencing a null pointer (undefined behavior). If, however, the list has some elements, if the restoration index is outside the bounds this results in heap OOB read. We have patched the issue in GitHub commit 9e82dce6e6bd1f36a57e08fa85af213e2b2f2622. The fix will be included in TensorFlow 2.6.0. We will also cherrypick this commit on TensorFlow 2.5.1, TensorFlow 2.4.3, and TensorFlow 2.3.4, as these are also affected and still in supported range.

Exploitation Scenario

An insider or attacker with a compromised low-privilege account on a shared GPU training cluster invokes TensorFlow's raw restore tensor API with either an empty tensor name (null pointer dereference, process crash) or a tensor name that resolves to a short list paired with an out-of-bounds shard index. In the OOB read path, the attacker iterates shard indices to read adjacent heap allocations—potentially extracting model weights, in-flight training batches, or authentication tokens cached by co-running processes. On a JupyterHub node where multiple data scientists share the same TF runtime, this enables cross-user data extraction without elevated privileges.

Weaknesses (CWE)

CWE-125 — Out-of-bounds Read: The product reads data past the end, or before the beginning, of the intended buffer.

  • [Implementation] Assume all input is malicious. Use an "accept known good" input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does. When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, "boat" may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as "red" or "blue." Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code's environment changes. This can give attackers enough room to bypass the intended validation. However, denylis
  • [Architecture and Design] Use a language that provides appropriate memory abstractions.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
August 12, 2021
Last Modified
November 21, 2024
First Seen
August 12, 2021

Related Vulnerabilities