CVE-2021-41198: TensorFlow: tf.tile integer overflow crashes ML process

MEDIUM PoC AVAILABLE
Published November 5, 2021
CISO Take

A local attacker with minimal privileges can crash any TensorFlow process by passing an oversized tensor to tf.tile, causing a CHECK-failure due to int64 overflow. Patch immediately to TensorFlow 2.4.4+, 2.5.2+, 2.6.1+, or 2.7.0+. Risk is bounded to availability — no data exfiltration or code execution path exists.

Risk Assessment

Medium risk in isolation. Local attack vector limits exposure to multi-tenant training infrastructure, shared ML workspaces, or systems accepting untrusted model/graph inputs. In Jupyter-based environments or shared GPU clusters, a malicious notebook can crash co-tenant TF sessions. Not network-exploitable directly, but if TF is wrapped in a serving API that processes user-supplied tensor specs, the effective attack surface expands to network.

Affected Systems

Package Ecosystem Vulnerable Range Patched
tensorflow pip No patch
195.0K OpenSSF 7.2 3.7K dependents Pushed 6d ago 4% patched ~1372d to patch Full package profile →

Do you use tensorflow? You're affected.

Severity & Risk

CVSS 3.1
5.5 / 10
EPSS
0.0%
chance of exploitation in 30 days
Higher than 15% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

Recommended Action

5 steps
  1. Upgrade to TensorFlow 2.4.4, 2.5.2, 2.6.1, or 2.7.0 — patch at commit 9294094df6fea79271778eb7e7ae1bad8b5ef98f.

  2. If patching is not immediately possible, add input validation to reject tensor shapes whose product exceeds INT64_MAX before passing to tf.tile.

  3. In multi-tenant environments, enforce resource quotas and process isolation so a crashed session cannot affect others.

  4. Audit any serving layer that accepts external tensor dimensions — reject inputs where multiples of repeated dimensions would overflow int64.

  5. Detection: monitor for unexpected TF process exits or CHECK-failure log lines containing 'tile'.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Art. 15 - Accuracy, robustness and cybersecurity of high-risk AI systems
ISO 42001
A.9.2 - AI system availability and resilience
NIST AI RMF
RE-1.1 - Reliability and robustness of AI systems
OWASP LLM Top 10
LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2021-41198?

A local attacker with minimal privileges can crash any TensorFlow process by passing an oversized tensor to tf.tile, causing a CHECK-failure due to int64 overflow. Patch immediately to TensorFlow 2.4.4+, 2.5.2+, 2.6.1+, or 2.7.0+. Risk is bounded to availability — no data exfiltration or code execution path exists.

Is CVE-2021-41198 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-41198, increasing the risk of exploitation.

How to fix CVE-2021-41198?

1. Upgrade to TensorFlow 2.4.4, 2.5.2, 2.6.1, or 2.7.0 — patch at commit 9294094df6fea79271778eb7e7ae1bad8b5ef98f. 2. If patching is not immediately possible, add input validation to reject tensor shapes whose product exceeds INT64_MAX before passing to tf.tile. 3. In multi-tenant environments, enforce resource quotas and process isolation so a crashed session cannot affect others. 4. Audit any serving layer that accepts external tensor dimensions — reject inputs where multiples of repeated dimensions would overflow int64. 5. Detection: monitor for unexpected TF process exits or CHECK-failure log lines containing 'tile'.

What systems are affected by CVE-2021-41198?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, data preprocessing pipelines, shared ML workspaces.

What is the CVSS score for CVE-2021-41198?

CVE-2021-41198 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.05%.

Technical Details

NVD Description

TensorFlow is an open source platform for machine learning. In affected versions if `tf.tile` is called with a large input argument then the TensorFlow process will crash due to a `CHECK`-failure caused by an overflow. The number of elements in the output tensor is too much for the `int64_t` type and the overflow is detected via a `CHECK` statement. This aborts the process. The fix will be included in TensorFlow 2.7.0. We will also cherrypick this commit on TensorFlow 2.6.1, TensorFlow 2.5.2, and TensorFlow 2.4.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with access to a shared ML training cluster (e.g., a compromised notebook user or rogue data scientist) submits a training job that calls tf.tile with a tensor shaped to produce an output with more than INT64_MAX elements. The TF process hits the CHECK assertion, crashes, and takes down any co-located training runs or serving replicas sharing that process. In a Kubernetes-based MLOps environment, this triggers repeated pod restarts, disrupting production inference serving during an outage window the adversary can time for maximum impact.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
November 5, 2021
Last Modified
November 21, 2024
First Seen
November 5, 2021

Related Vulnerabilities