CVE-2021-41198: TensorFlow tf.tile integer overflow

CISO Take

A local attacker with minimal privileges can crash any TensorFlow process by passing an oversized tensor to tf.tile, causing a CHECK-failure due to int64 overflow. Patch immediately to TensorFlow 2.4.4+, 2.5.2+, 2.6.1+, or 2.7.0+. Risk is bounded to availability — no data exfiltration or code execution path exists.

What is the risk?

Medium risk in isolation. Local attack vector limits exposure to multi-tenant training infrastructure, shared ML workspaces, or systems accepting untrusted model/graph inputs. In Jupyter-based environments or shared GPU clusters, a malicious notebook can crash co-tenant TF sessions. Not network-exploitable directly, but if TF is wrapped in a serving API that processes user-supplied tensor specs, the effective attack surface expands to network.

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
TensorFlow	pip	—	No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 2d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1

5.5 / 10

EPSS

0.2%

chance of exploitation in 30 days

Higher than 14% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Trivial

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV Local

AC Low

PR Low

UI None

S Unchanged

C None

I None

A High

What should I do?

5 steps

Upgrade to TensorFlow 2.4.4, 2.5.2, 2.6.1, or 2.7.0 — patch at commit 9294094df6fea79271778eb7e7ae1bad8b5ef98f.
If patching is not immediately possible, add input validation to reject tensor shapes whose product exceeds INT64_MAX before passing to tf.tile.
In multi-tenant environments, enforce resource quotas and process isolation so a crashed session cannot affect others.
Audit any serving layer that accepts external tensor dimensions — reject inputs where multiples of repeated dimensions would overflow int64.
Detection: monitor for unexpected TF process exits or CHECK-failure log lines containing 'tile'.

How is it classified?

DoS Framework Inference Training Data AML.T0029 - Denial of AI Service AML.T0034 - Cost Harvesting AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Art. 15 - Accuracy, robustness and cybersecurity of high-risk AI systems

ISO 42001

A.9.2 - AI system availability and resilience

NIST AI RMF

RE-1.1 - Reliability and robustness of AI systems

OWASP LLM Top 10

LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2021-41198?

A local attacker with minimal privileges can crash any TensorFlow process by passing an oversized tensor to tf.tile, causing a CHECK-failure due to int64 overflow. Patch immediately to TensorFlow 2.4.4+, 2.5.2+, 2.6.1+, or 2.7.0+. Risk is bounded to availability — no data exfiltration or code execution path exists.

Is CVE-2021-41198 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-41198, increasing the risk of exploitation.

How to fix CVE-2021-41198?

1. Upgrade to TensorFlow 2.4.4, 2.5.2, 2.6.1, or 2.7.0 — patch at commit 9294094df6fea79271778eb7e7ae1bad8b5ef98f. 2. If patching is not immediately possible, add input validation to reject tensor shapes whose product exceeds INT64_MAX before passing to tf.tile. 3. In multi-tenant environments, enforce resource quotas and process isolation so a crashed session cannot affect others. 4. Audit any serving layer that accepts external tensor dimensions — reject inputs where multiples of repeated dimensions would overflow int64. 5. Detection: monitor for unexpected TF process exits or CHECK-failure log lines containing 'tile'.

What systems are affected by CVE-2021-41198?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, data preprocessing pipelines, shared ML workspaces.

What is the CVSS score for CVE-2021-41198?

CVE-2021-41198 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.23%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingdata preprocessing pipelinesshared ML workspaces

MITRE ATLAS Techniques

AML.T0029 Denial of AI Service

AML.T0034 Cost Harvesting

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art. 15

ISO 42001: A.9.2

NIST AI RMF: RE-1.1

OWASP LLM Top 10: LLM04

What are the technical details?

Original Advisory

TensorFlow is an open source platform for machine learning. In affected versions if `tf.tile` is called with a large input argument then the TensorFlow process will crash due to a `CHECK`-failure caused by an overflow. The number of elements in the output tensor is too much for the `int64_t` type and the overflow is detected via a `CHECK` statement. This aborts the process. The fix will be included in TensorFlow 2.7.0. We will also cherrypick this commit on TensorFlow 2.6.1, TensorFlow 2.5.2, and TensorFlow 2.4.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with access to a shared ML training cluster (e.g., a compromised notebook user or rogue data scientist) submits a training job that calls tf.tile with a tensor shaped to produce an output with more than INT64_MAX elements. The TF process hits the CHECK assertion, crashes, and takes down any co-located training runs or serving replicas sharing that process. In a Kubernetes-based MLOps environment, this triggers repeated pod restarts, disrupting production inference serving during an outage window the adversary can time for maximum impact.

Weaknesses (CWE)

CWE-190 Integer Overflow or Wraparound Primary CWE-190 Integer Overflow or Wraparound

CWE-190 — Integer Overflow or Wraparound: The product performs a calculation that can produce an integer overflow or wraparound when the logic assumes that the resulting value will always be larger than the original value. This occurs when an integer value is incremented to a value that is too large to store in the associated representation. When this occurs, the value may become a very small or negative number.

[Requirements] Ensure that all protocols are strictly defined, such that all out-of-bounds behavior can be identified simply, and require strict conformance to the protocol.
[Requirements] Use a language that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. If possible, choose a language or compiler that performs automatic bounds checking.

Source: MITRE CWE corpus.