CVE-2021-37690: TensorFlow: use-after-free crashes training processes

MEDIUM
Published August 13, 2021
CISO Take

A use-after-free in TensorFlow's shape inference engine allows a local attacker with minimal privileges to crash TF processes via segfault. Patch to TF 2.6.0, 2.5.1, 2.4.3, or 2.3.4 immediately on training infrastructure. No active exploitation known, but unpatched training clusters are exposed to denial-of-service against long-running jobs.

What is the risk?

Medium risk in practice. CVSS 6.6 with local access vector limits remote exploitation; an attacker needs a foothold on the machine running TensorFlow. The availability impact is high (A:H) — a crash terminates training runs — but confidentiality and integrity impact are low. Not in CISA KEV and no public exploit code observed. Risk elevates in multi-tenant ML platforms (e.g., shared Jupyter environments, Kubeflow clusters) where low-privileged users co-exist with production training workloads.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
6.6 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 6% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Moderate

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C Low
I Low
A High

What should I do?

5 steps
  1. Patch: Upgrade to TensorFlow 2.6.0, 2.5.1, 2.4.3, or 2.3.4 — all include commit ee119d4a.

  2. No viable workaround exists short of patching; avoid running untrusted TF graphs as a defense-in-depth measure.

  3. Isolate training environments: restrict who can submit training jobs to multi-tenant clusters.

  4. For containerized workloads, rebuild and redeploy ML containers with patched base images.

  5. Detection: monitor for unexpected TF process crashes (SIGSEGV) in training logs — repeated segfaults on hash table ops may indicate active probing.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 9 - Risk management system
ISO 42001
A.6.2.6 - AI system security
NIST AI RMF
GOVERN 5.2 - Organizational teams are committed to a culture that considers and communicates AI risk MANAGE 2.2 - Mechanisms to address identified AI risks

Frequently Asked Questions

What is CVE-2021-37690?

A use-after-free in TensorFlow's shape inference engine allows a local attacker with minimal privileges to crash TF processes via segfault. Patch to TF 2.6.0, 2.5.1, 2.4.3, or 2.3.4 immediately on training infrastructure. No active exploitation known, but unpatched training clusters are exposed to denial-of-service against long-running jobs.

Is CVE-2021-37690 actively exploited?

No confirmed active exploitation of CVE-2021-37690 has been reported, but organizations should still patch proactively.

How to fix CVE-2021-37690?

1. Patch: Upgrade to TensorFlow 2.6.0, 2.5.1, 2.4.3, or 2.3.4 — all include commit ee119d4a. 2. No viable workaround exists short of patching; avoid running untrusted TF graphs as a defense-in-depth measure. 3. Isolate training environments: restrict who can submit training jobs to multi-tenant clusters. 4. For containerized workloads, rebuild and redeploy ML containers with patched base images. 5. Detection: monitor for unexpected TF process crashes (SIGSEGV) in training logs — repeated segfaults on hash table ops may indicate active probing.

What systems are affected by CVE-2021-37690?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, MLOps platforms.

What is the CVSS score for CVE-2021-37690?

CVE-2021-37690 has a CVSS v3.1 base score of 6.6 (MEDIUM). The EPSS exploitation probability is 0.16%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingMLOps platforms

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0029 Denial of AI Service
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 9
ISO 42001: A.6.2.6
NIST AI RMF: GOVERN 5.2, MANAGE 2.2

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. In affected versions when running shape functions, some functions (such as `MutableHashTableShape`) produce extra output information in the form of a `ShapeAndType` struct. The shapes embedded in this struct are owned by an inference context that is cleaned up almost immediately; if the upstream code attempts to access this shape information, it can trigger a segfault. `ShapeRefiner` is mitigating this for normal output shapes by cloning them (and thus putting the newly created shape under ownership of an inference context that will not die), but we were not doing the same for shapes and types. This commit fixes that by doing similar logic on output shapes and types. We have patched the issue in GitHub commit ee119d4a498979525046fba1c3dd3f13a039fbb1. The fix will be included in TensorFlow 2.6.0. We will also cherrypick this commit on TensorFlow 2.5.1, TensorFlow 2.4.3, and TensorFlow 2.3.4, as these are also affected and still in supported range.

Exploitation Scenario

An attacker with low-privileged access to a shared ML training cluster submits a TensorFlow graph containing a MutableHashTable operation with crafted shape inputs. When TensorFlow's ShapeRefiner evaluates the graph during session initialization or shape inference, the MutableHashTableShape function writes ShapeAndType structs referencing an inference context that is immediately freed. Accessing those dangling shape pointers triggers a segfault, killing the training process. On a Kubeflow or MLflow multi-tenant platform, this would terminate co-located training jobs, causing data loss for unfinished runs and potential GPU resource waste.

Weaknesses (CWE)

CWE-416 — Use After Free: The product reuses or references memory after it has been freed. At some point afterward, the memory may be allocated again and saved in another pointer, while the original pointer references a location somewhere within the new allocation. Any operations using the original pointer are no longer valid because the memory "belongs" to the code that operates on the new pointer.

  • [Architecture and Design] Choose a language that provides automatic memory management.
  • [Implementation] When freeing pointers, be sure to set them to NULL once they are freed. However, the utilization of multiple or complex data structures may lower the usefulness of this strategy.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:L/I:L/A:H

Timeline

Published
August 13, 2021
Last Modified
November 21, 2024
First Seen
August 13, 2021

Related Vulnerabilities