CVE-2021-37679: TensorFlow: heap over-read leaks memory via RaggedTensor

HIGH
Published August 12, 2021
CISO Take

Upgrade TensorFlow to 2.6.0, 2.5.1, 2.4.3, or 2.3.4 immediately on any ML training or serving infrastructure. Local exploitability limits blast radius, but any multi-tenant or shared training cluster is at elevated risk of heap memory exposure — which may contain model weights, training data, or credentials. This is fully patched; running unpatched versions is indefensible.

What is the risk?

CVSS 7.8 High with local attack vector, low complexity, and low privilege requirement. Risk is amplified in shared ML training clusters where multiple users or teams operate on the same host. A low-privileged attacker can craft nested tf.map_fn calls to read uninitialized heap memory, potentially exposing adjacent memory regions containing sensitive model artifacts or data. Not in CISA KEV and no known active exploitation, but the technique is straightforward once understood.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
7.8 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 8% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Moderate

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C High
I High
A High

What should I do?

1 step
  1. 1) Patch: Upgrade to TensorFlow 2.6.0, 2.5.1, 2.4.3, or 2.3.4. 2) Audit: Search codebase for tf.map_fn calls accepting RaggedTensor inputs without explicit fn signatures — prioritize data ingestion and preprocessing layers. 3) Isolate: Enforce process-level and namespace isolation between tenants on shared ML training clusters. 4) Harden CI/CD: Block downgrades to affected TF versions via dependency pinning and policy enforcement. 5) Detect: Profile heap allocations on training servers; anomalous output tensor values may indicate exploitation attempts.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Art. 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2 - AI system design and development
NIST AI RMF
MANAGE 2.2 - Mechanisms to prevent AI risks from becoming realized
OWASP LLM Top 10
LLM06:2023 - Sensitive Information Disclosure

Frequently Asked Questions

What is CVE-2021-37679?

Upgrade TensorFlow to 2.6.0, 2.5.1, 2.4.3, or 2.3.4 immediately on any ML training or serving infrastructure. Local exploitability limits blast radius, but any multi-tenant or shared training cluster is at elevated risk of heap memory exposure — which may contain model weights, training data, or credentials. This is fully patched; running unpatched versions is indefensible.

Is CVE-2021-37679 actively exploited?

No confirmed active exploitation of CVE-2021-37679 has been reported, but organizations should still patch proactively.

How to fix CVE-2021-37679?

1) Patch: Upgrade to TensorFlow 2.6.0, 2.5.1, 2.4.3, or 2.3.4. 2) Audit: Search codebase for tf.map_fn calls accepting RaggedTensor inputs without explicit fn signatures — prioritize data ingestion and preprocessing layers. 3) Isolate: Enforce process-level and namespace isolation between tenants on shared ML training clusters. 4) Harden CI/CD: Block downgrades to affected TF versions via dependency pinning and policy enforcement. 5) Detect: Profile heap allocations on training servers; anomalous output tensor values may indicate exploitation attempts.

What systems are affected by CVE-2021-37679?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, data preprocessing pipelines, shared ML platforms.

What is the CVSS score for CVE-2021-37679?

CVE-2021-37679 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.18%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingdata preprocessing pipelinesshared ML platforms

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0025 Exfiltration via Cyber Means
AML.T0037 Data from Local System

Compliance Controls Affected

EU AI Act: Art. 15
ISO 42001: A.6.2
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM06:2023

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. In affected versions it is possible to nest a `tf.map_fn` within another `tf.map_fn` call. However, if the input tensor is a `RaggedTensor` and there is no function signature provided, code assumes the output is a fully specified tensor and fills output buffer with uninitialized contents from the heap. The `t` and `z` outputs should be identical, however this is not the case. The last row of `t` contains data from the heap which can be used to leak other memory information. The bug lies in the conversion from a `Variant` tensor to a `RaggedTensor`. The [implementation](https://github.com/tensorflow/tensorflow/blob/460e000de3a83278fb00b61a16d161b1964f15f4/tensorflow/core/kernels/ragged_tensor_from_variant_op.cc#L177-L190) does not check that all inner shapes match and this results in the additional dimensions. The same implementation can result in data loss, if input tensor is tweaked. We have patched the issue in GitHub commit 4e2565483d0ffcadc719bd44893fb7f609bb5f12. The fix will be included in TensorFlow 2.6.0. We will also cherrypick this commit on TensorFlow 2.5.1, TensorFlow 2.4.3, and TensorFlow 2.3.4, as these are also affected and still in supported range.

Exploitation Scenario

An insider threat or attacker with low-privilege access to a shared ML training cluster writes a script nesting tf.map_fn calls with RaggedTensor inputs and no function signature. TensorFlow fills the output tensor with uninitialized heap memory from adjacent allocations. The attacker reads the output tensor contents, recovering memory fragments from co-located processes — potentially capturing model weights from a competing tenant, API keys loaded into memory by a secrets manager, or PII from a training batch. On HIPAA or PCI-regulated training environments, this constitutes a reportable data breach without any network-level exploitation.

Weaknesses (CWE)

CWE-681 — Incorrect Conversion between Numeric Types: When converting from one data type to another, such as long to integer, data can be omitted or translated in a way that produces unexpected values. If the resulting values are used in a sensitive context, then dangerous behaviors may occur.

  • [Implementation] Avoid making conversion between numeric types. Always check for the allowed ranges.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
August 12, 2021
Last Modified
November 21, 2024
First Seen
August 12, 2021

Related Vulnerabilities