CVE-2021-41219: TensorFlow: heap OOB in sparse matrix multiply

HIGH PoC AVAILABLE
Published November 5, 2021
CISO Take

TensorFlow versions prior to 2.7.0 contain a heap out-of-bounds access in sparse matrix multiplication triggered by crafting tensors with zero-dimension inputs. While local access is required, this is directly exploitable in shared training environments, Jupyter platforms, or multi-tenant ML infrastructure where users can submit custom operations. Upgrade to TensorFlow 2.7.0, 2.6.1, 2.5.2, or 2.4.4 immediately and audit shared ML environments for user-controlled sparse tensor inputs.

What is the risk?

Risk is HIGH for shared training infrastructure or Jupyter-based ML platforms where multiple users can execute TensorFlow code. For isolated single-user environments, practical risk is MEDIUM. Low attack complexity combined with low privilege requirements means exploitation is straightforward once local or platform-level access is achieved. No active exploitation evidence in the wild; not in CISA KEV. Organizations running shared GPU clusters or MLaaS offerings face the greatest exposure.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
7.8 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 10% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Moderate
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C High
I High
A High

What should I do?

5 steps
  1. Upgrade TensorFlow to 2.7.0 (primary fix) or apply cherrypick patches to 2.6.1, 2.5.2, or 2.4.4 if a full upgrade is not immediately feasible.

  2. Apply commit e6cf28c72ba2eb949ca950d834dd6d66bb01cfae directly if version pinning constraints prevent upgrading.

  3. Validate tensor dimensions at pipeline ingestion boundaries before passing to sparse ops — reject any tensor with zero or negative dimensions.

  4. In multi-tenant ML platforms, isolate TensorFlow workloads per user or team until patching is complete.

  5. Detection: monitor TF worker processes for crashes or abnormal termination signals when processing sparse operations with atypical tensor shapes.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Art. 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2.5 - AI system security
NIST AI RMF
MANAGE 2.2 - Mechanisms to sustain AI risk management are operational
OWASP LLM Top 10
LLM06 - Sensitive Information Disclosure

Frequently Asked Questions

What is CVE-2021-41219?

TensorFlow versions prior to 2.7.0 contain a heap out-of-bounds access in sparse matrix multiplication triggered by crafting tensors with zero-dimension inputs. While local access is required, this is directly exploitable in shared training environments, Jupyter platforms, or multi-tenant ML infrastructure where users can submit custom operations. Upgrade to TensorFlow 2.7.0, 2.6.1, 2.5.2, or 2.4.4 immediately and audit shared ML environments for user-controlled sparse tensor inputs.

Is CVE-2021-41219 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-41219, increasing the risk of exploitation.

How to fix CVE-2021-41219?

1. Upgrade TensorFlow to 2.7.0 (primary fix) or apply cherrypick patches to 2.6.1, 2.5.2, or 2.4.4 if a full upgrade is not immediately feasible. 2. Apply commit e6cf28c72ba2eb949ca950d834dd6d66bb01cfae directly if version pinning constraints prevent upgrading. 3. Validate tensor dimensions at pipeline ingestion boundaries before passing to sparse ops — reject any tensor with zero or negative dimensions. 4. In multi-tenant ML platforms, isolate TensorFlow workloads per user or team until patching is complete. 5. Detection: monitor TF worker processes for crashes or abnormal termination signals when processing sparse operations with atypical tensor shapes.

What systems are affected by CVE-2021-41219?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, shared ML platforms, model serving.

What is the CVSS score for CVE-2021-41219?

CVE-2021-41219 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.20%.

What is the AI security impact?

Affected AI Architectures

training pipelinesshared ML platformsmodel serving

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0043 Craft Adversarial Data
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art. 15
ISO 42001: A.6.2.5
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM06

What are the technical details?

Original Advisory

TensorFlow is an open source platform for machine learning. In affected versions the code for sparse matrix multiplication is vulnerable to undefined behavior via binding a reference to `nullptr`. This occurs whenever the dimensions of `a` or `b` are 0 or less. In the case on one of these is 0, an empty output tensor should be allocated (to conserve the invariant that output tensors are always allocated when the operation is successful) but nothing should be written to it (that is, we should return early from the kernel implementation). Otherwise, attempts to write to this empty tensor would result in heap OOB access. The fix will be included in TensorFlow 2.7.0. We will also cherrypick this commit on TensorFlow 2.6.1, TensorFlow 2.5.2, and TensorFlow 2.4.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with access to a shared ML training platform — such as a data scientist on a multi-user Jupyter environment or a malicious insider on a GPU cluster — submits a training job invoking sparse matrix multiplication with a tensor where one dimension is set to 0. TensorFlow attempts to bind a reference to nullptr and then writes to the resulting empty output tensor, triggering heap OOB access. In a containerized training environment, this could crash the worker process (DoS to ongoing training runs) or, with additional heap grooming, achieve code execution within the TF process context, potentially exfiltrating model weights or training data resident in memory.

Weaknesses (CWE)

CWE-125 — Out-of-bounds Read: The product reads data past the end, or before the beginning, of the intended buffer.

  • [Implementation] Assume all input is malicious. Use an "accept known good" input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does. When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, "boat" may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as "red" or "blue." Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code's environment changes. This can give attackers enough room to bypass the intended validation. However, denylis
  • [Architecture and Design] Use a language that provides appropriate memory abstractions.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
November 5, 2021
Last Modified
November 21, 2024
First Seen
November 5, 2021

Related Vulnerabilities