CVE-2021-41198: TensorFlow: tf.tile integer overflow crashes ML process

MEDIUM PoC AVAILABLE
Published November 5, 2021
CISO Take

A local attacker with minimal privileges can crash any TensorFlow process by passing an oversized tensor to tf.tile, causing a CHECK-failure due to int64 overflow. Patch immediately to TensorFlow 2.4.4+, 2.5.2+, 2.6.1+, or 2.7.0+. Risk is bounded to availability — no data exfiltration or code execution path exists.

What is the risk?

Medium risk in isolation. Local attack vector limits exposure to multi-tenant training infrastructure, shared ML workspaces, or systems accepting untrusted model/graph inputs. In Jupyter-based environments or shared GPU clusters, a malicious notebook can crash co-tenant TF sessions. Not network-exploitable directly, but if TF is wrapped in a serving API that processes user-supplied tensor specs, the effective attack surface expands to network.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 2d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
5.5 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 14% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

What should I do?

5 steps
  1. Upgrade to TensorFlow 2.4.4, 2.5.2, 2.6.1, or 2.7.0 — patch at commit 9294094df6fea79271778eb7e7ae1bad8b5ef98f.

  2. If patching is not immediately possible, add input validation to reject tensor shapes whose product exceeds INT64_MAX before passing to tf.tile.

  3. In multi-tenant environments, enforce resource quotas and process isolation so a crashed session cannot affect others.

  4. Audit any serving layer that accepts external tensor dimensions — reject inputs where multiples of repeated dimensions would overflow int64.

  5. Detection: monitor for unexpected TF process exits or CHECK-failure log lines containing 'tile'.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Art. 15 - Accuracy, robustness and cybersecurity of high-risk AI systems
ISO 42001
A.9.2 - AI system availability and resilience
NIST AI RMF
RE-1.1 - Reliability and robustness of AI systems
OWASP LLM Top 10
LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2021-41198?

A local attacker with minimal privileges can crash any TensorFlow process by passing an oversized tensor to tf.tile, causing a CHECK-failure due to int64 overflow. Patch immediately to TensorFlow 2.4.4+, 2.5.2+, 2.6.1+, or 2.7.0+. Risk is bounded to availability — no data exfiltration or code execution path exists.

Is CVE-2021-41198 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-41198, increasing the risk of exploitation.

How to fix CVE-2021-41198?

1. Upgrade to TensorFlow 2.4.4, 2.5.2, 2.6.1, or 2.7.0 — patch at commit 9294094df6fea79271778eb7e7ae1bad8b5ef98f. 2. If patching is not immediately possible, add input validation to reject tensor shapes whose product exceeds INT64_MAX before passing to tf.tile. 3. In multi-tenant environments, enforce resource quotas and process isolation so a crashed session cannot affect others. 4. Audit any serving layer that accepts external tensor dimensions — reject inputs where multiples of repeated dimensions would overflow int64. 5. Detection: monitor for unexpected TF process exits or CHECK-failure log lines containing 'tile'.

What systems are affected by CVE-2021-41198?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, data preprocessing pipelines, shared ML workspaces.

What is the CVSS score for CVE-2021-41198?

CVE-2021-41198 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.23%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingdata preprocessing pipelinesshared ML workspaces

MITRE ATLAS Techniques

AML.T0029 Denial of AI Service
AML.T0034 Cost Harvesting
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art. 15
ISO 42001: A.9.2
NIST AI RMF: RE-1.1
OWASP LLM Top 10: LLM04

What are the technical details?

Original Advisory

TensorFlow is an open source platform for machine learning. In affected versions if `tf.tile` is called with a large input argument then the TensorFlow process will crash due to a `CHECK`-failure caused by an overflow. The number of elements in the output tensor is too much for the `int64_t` type and the overflow is detected via a `CHECK` statement. This aborts the process. The fix will be included in TensorFlow 2.7.0. We will also cherrypick this commit on TensorFlow 2.6.1, TensorFlow 2.5.2, and TensorFlow 2.4.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with access to a shared ML training cluster (e.g., a compromised notebook user or rogue data scientist) submits a training job that calls tf.tile with a tensor shaped to produce an output with more than INT64_MAX elements. The TF process hits the CHECK assertion, crashes, and takes down any co-located training runs or serving replicas sharing that process. In a Kubernetes-based MLOps environment, this triggers repeated pod restarts, disrupting production inference serving during an outage window the adversary can time for maximum impact.

Weaknesses (CWE)

CWE-190 — Integer Overflow or Wraparound: The product performs a calculation that can produce an integer overflow or wraparound when the logic assumes that the resulting value will always be larger than the original value. This occurs when an integer value is incremented to a value that is too large to store in the associated representation. When this occurs, the value may become a very small or negative number.

  • [Requirements] Ensure that all protocols are strictly defined, such that all out-of-bounds behavior can be identified simply, and require strict conformance to the protocol.
  • [Requirements] Use a language that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. If possible, choose a language or compiler that performs automatic bounds checking.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
November 5, 2021
Last Modified
November 21, 2024
First Seen
November 5, 2021

Related Vulnerabilities