CVE-2021-37659: TensorFlow: heap OOB in cwise ops enables local RCE

HIGH
Published August 12, 2021
CISO Take

Upgrade TensorFlow to 2.6.0, 2.5.1, 2.4.3, or 2.3.4 on all training and inference infrastructure immediately. While local access is required, shared ML platforms—Jupyter hubs, GPU clusters, containerized MLOps pipelines—are routine attack surfaces where any low-privileged user can trigger this. Heap corruption enables privilege escalation beyond model code isolation boundaries, threatening host-level compromise.

What is the risk?

Effective risk is moderate-to-high in shared ML compute environments despite the local attack vector. CVSS 7.8 reflects full CIA impact (C:H/I:H/A:H) with low complexity and low privileges—any user who can submit a TensorFlow job can exploit this. Shared GPU clusters, notebook platforms, and containerized training workers amplify the local-access barrier. Not in CISA KEV and no confirmed active exploitation, but the patch has been public since 2021; unpatched deployments represent an inexcusable residual risk.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
7.8 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 7% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Moderate

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C High
I High
A High

What should I do?

5 steps
  1. Patch immediately: upgrade to TensorFlow >= 2.6.0 or apply cherrypicks for 2.5.1, 2.4.3, 2.3.4 (commit 93f428fd1768df147171ed674fee1fc5ab8309ec).

  2. Audit all TF deployments: scan CI/CD runners, Jupyter environments, and container images—pip show tensorflow or pip3 show tensorflow.

  3. Enforce tensor shape validation at pipeline ingestion points before ops execute to reduce attack surface.

  4. Run training jobs under dedicated least-privilege service accounts to contain blast radius if exploited.

  5. Detection: monitor for SIGSEGV/SIGABRT in TF worker logs and unexpected core dumps from training processes; heap OOB often manifests as intermittent crashes before controlled exploitation.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity Article 9 - Risk management system
ISO 42001
A.10.3 - Third-party AI components A.6.1.4 - AI system risk assessment
NIST AI RMF
GOVERN-1.1 - Policies for AI risk management MANAGE-2.2 - Mechanisms to respond to AI risks
OWASP LLM Top 10
LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-37659?

Upgrade TensorFlow to 2.6.0, 2.5.1, 2.4.3, or 2.3.4 on all training and inference infrastructure immediately. While local access is required, shared ML platforms—Jupyter hubs, GPU clusters, containerized MLOps pipelines—are routine attack surfaces where any low-privileged user can trigger this. Heap corruption enables privilege escalation beyond model code isolation boundaries, threatening host-level compromise.

Is CVE-2021-37659 actively exploited?

No confirmed active exploitation of CVE-2021-37659 has been reported, but organizations should still patch proactively.

How to fix CVE-2021-37659?

1. Patch immediately: upgrade to TensorFlow >= 2.6.0 or apply cherrypicks for 2.5.1, 2.4.3, 2.3.4 (commit 93f428fd1768df147171ed674fee1fc5ab8309ec). 2. Audit all TF deployments: scan CI/CD runners, Jupyter environments, and container images—`pip show tensorflow` or `pip3 show tensorflow`. 3. Enforce tensor shape validation at pipeline ingestion points before ops execute to reduce attack surface. 4. Run training jobs under dedicated least-privilege service accounts to contain blast radius if exploited. 5. Detection: monitor for SIGSEGV/SIGABRT in TF worker logs and unexpected core dumps from training processes; heap OOB often manifests as intermittent crashes before controlled exploitation.

What systems are affected by CVE-2021-37659?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, MLOps platforms, Jupyter/notebook environments.

What is the CVSS score for CVE-2021-37659?

CVE-2021-37659 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.18%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingMLOps platformsJupyter/notebook environments

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0044 Full AI Model Access
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15, Article 9
ISO 42001: A.10.3, A.6.1.4
NIST AI RMF: GOVERN-1.1, MANAGE-2.2
OWASP LLM Top 10: LLM05

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. In affected versions an attacker can cause undefined behavior via binding a reference to null pointer in all binary cwise operations that don't require broadcasting (e.g., gradients of binary cwise operations). The [implementation](https://github.com/tensorflow/tensorflow/blob/84d053187cb80d975ef2b9684d4b61981bca0c41/tensorflow/core/kernels/cwise_ops_common.h#L264) assumes that the two inputs have exactly the same number of elements but does not check that. Hence, when the eigen functor executes it triggers heap OOB reads and undefined behavior due to binding to nullptr. We have patched the issue in GitHub commit 93f428fd1768df147171ed674fee1fc5ab8309ec. The fix will be included in TensorFlow 2.6.0. We will also cherrypick this commit on TensorFlow 2.5.1, TensorFlow 2.4.3, and TensorFlow 2.3.4, as these are also affected and still in supported range.

Exploitation Scenario

An attacker with low-privilege access to a shared GPU training cluster—e.g., a compromised data scientist account or a malicious CI pipeline contribution—submits a TensorFlow training job invoking a binary element-wise operation (such as a custom gradient layer) with two tensors of deliberately mismatched sizes. Because TF's cwise kernel assumes shape equality without validating it, the Eigen functor binds a reference to a null pointer and executes heap reads beyond allocated tensor memory. This leaks adjacent heap contents (model weights, auth tokens, neighboring tenant data on a multi-tenant cluster) and can be chained with heap grooming to achieve code execution on the training host, potentially escaping a containerized ML workload to compromise the underlying node.

Weaknesses (CWE)

CWE-125 — Out-of-bounds Read: The product reads data past the end, or before the beginning, of the intended buffer.

  • [Implementation] Assume all input is malicious. Use an "accept known good" input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does. When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, "boat" may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as "red" or "blue." Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code's environment changes. This can give attackers enough room to bypass the intended validation. However, denylis
  • [Architecture and Design] Use a language that provides appropriate memory abstractions.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
August 12, 2021
Last Modified
November 21, 2024
First Seen
August 12, 2021

Related Vulnerabilities