CVE-2021-37635: TensorFlow: heap OOB read in sparse reduction ops

HIGH
Published August 12, 2021
CISO Take

TensorFlow's sparse reduction kernel fails to validate tensor index bounds, enabling heap out-of-bounds reads that can expose in-memory data (C:H) or crash the process (A:H). Any TF deployment prior to 2.6.0/2.5.1/2.4.3/2.3.4 that processes sparse tensors is vulnerable. Patch immediately — shared ML infrastructure faces elevated risk from low-privilege insiders or compromised pipeline accounts that can submit crafted workloads.

What is the risk?

CVSS 7.1 High with low attack complexity and low privilege requirements — any user with code execution on a TF host can trigger this. The confidentiality impact is HIGH, meaning heap memory exposure can leak co-tenant data, model weights, or in-memory credentials. Availability is also HIGH via process crash. Not in CISA KEV and no confirmed active exploitation, but the low trigger barrier makes opportunistic exploitation plausible in multi-tenant ML environments.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
7.1 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 6% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Moderate

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C High
I None
A High

What should I do?

4 steps
  1. Patch: Upgrade to TensorFlow 2.6.0, or apply backports to 2.5.1, 2.4.3, or 2.3.4 for supported legacy versions.

  2. Workaround: Validate sparse tensor shapes and indices before passing to reduction ops; reject inputs where indices exceed the declared dense shape.

  3. Harden: Isolate TF workloads per tenant using containers or VMs to prevent cross-tenant memory exposure.

  4. Detect: Alert on unexpected OOM errors or segfaults in TF processes; monitor for anomalous sparse op usage patterns in shared training environments.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2 - Information security for AI systems
NIST AI RMF
MANAGE 2.2 - Mechanisms for addressing AI risks and vulnerabilities

Frequently Asked Questions

What is CVE-2021-37635?

TensorFlow's sparse reduction kernel fails to validate tensor index bounds, enabling heap out-of-bounds reads that can expose in-memory data (C:H) or crash the process (A:H). Any TF deployment prior to 2.6.0/2.5.1/2.4.3/2.3.4 that processes sparse tensors is vulnerable. Patch immediately — shared ML infrastructure faces elevated risk from low-privilege insiders or compromised pipeline accounts that can submit crafted workloads.

Is CVE-2021-37635 actively exploited?

No confirmed active exploitation of CVE-2021-37635 has been reported, but organizations should still patch proactively.

How to fix CVE-2021-37635?

1. Patch: Upgrade to TensorFlow 2.6.0, or apply backports to 2.5.1, 2.4.3, or 2.3.4 for supported legacy versions. 2. Workaround: Validate sparse tensor shapes and indices before passing to reduction ops; reject inputs where indices exceed the declared dense shape. 3. Harden: Isolate TF workloads per tenant using containers or VMs to prevent cross-tenant memory exposure. 4. Detect: Alert on unexpected OOM errors or segfaults in TF processes; monitor for anomalous sparse op usage patterns in shared training environments.

What systems are affected by CVE-2021-37635?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, shared ML infrastructure, recommendation systems.

What is the CVSS score for CVE-2021-37635?

CVE-2021-37635 has a CVSS v3.1 base score of 7.1 (HIGH). The EPSS exploitation probability is 0.17%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingshared ML infrastructurerecommendation systems

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0011.000 Unsafe AI Artifacts
AML.T0043 Craft Adversarial Data

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: A.6.2
NIST AI RMF: MANAGE 2.2

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. In affected versions the implementation of sparse reduction operations in TensorFlow can trigger accesses outside of bounds of heap allocated data. The [implementation](https://github.com/tensorflow/tensorflow/blob/a1bc56203f21a5a4995311825ffaba7a670d7747/tensorflow/core/kernels/sparse_reduce_op.cc#L217-L228) fails to validate that each reduction group does not overflow and that each corresponding index does not point to outside the bounds of the input tensor. We have patched the issue in GitHub commit 87158f43f05f2720a374f3e6d22a7aaa3a33f750. The fix will be included in TensorFlow 2.6.0. We will also cherrypick this commit on TensorFlow 2.5.1, TensorFlow 2.4.3, and TensorFlow 2.3.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with low-privilege access to a shared ML training cluster (e.g., via compromised CI/CD service account or rogue insider) submits a TF training job containing deliberately crafted sparse reduction ops. The script constructs a SparseTensor with reduction group indices that overflow, causing TensorFlow to read heap memory outside the allocated buffer. The attacker captures out-of-bounds data via TF error output or side-channel, potentially recovering adjacent heap contents such as other tenants' model weights, hyperparameters, or cached authentication tokens. On single-tenant systems, the same technique achieves denial of service by crashing the training run.

Weaknesses (CWE)

CWE-125 — Out-of-bounds Read: The product reads data past the end, or before the beginning, of the intended buffer.

  • [Implementation] Assume all input is malicious. Use an "accept known good" input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does. When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, "boat" may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as "red" or "blue." Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code's environment changes. This can give attackers enough room to bypass the intended validation. However, denylis
  • [Architecture and Design] Use a language that provides appropriate memory abstractions.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:N/A:H

Timeline

Published
August 12, 2021
Last Modified
November 21, 2024
First Seen
August 12, 2021

Related Vulnerabilities