CVE-2021-41199: TensorFlow: tf.image.resize integer overflow DoS

MEDIUM PoC AVAILABLE
Published November 5, 2021
CISO Take

A low-privileged local attacker can crash any TensorFlow process by passing oversized inputs to tf.image.resize, triggering a CHECK-failure via int64 overflow. This is a pure availability impact — no data exposure — but it can disrupt image preprocessing pipelines, training jobs, and inference servers in shared ML environments. Patch to TF 2.7.0, 2.6.1, 2.5.2, or 2.4.4 immediately; add input dimension validation as a defense-in-depth measure.

What is the risk?

Practical risk is moderate-low for most deployments. The local attack vector limits internet-facing exposure, but shared GPU clusters, multi-tenant ML platforms, and Jupyter notebook environments reduce this barrier significantly. Exploitability is trivial — no special knowledge required, just a crafted input. With CVSS availability impact rated HIGH, any TensorFlow-based production inference service handling user-controlled images is at real risk of disruption. The absence from CISA KEV and no known active exploitation in the wild keeps this from being urgent-critical.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 2d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
5.5 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 14% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

What should I do?

5 steps
  1. Patch: Upgrade TensorFlow to 2.7.0 or apply cherry-picks to 2.6.1 / 2.5.2 / 2.4.4 via commit e5272d4.

  2. Input validation: Enforce hard limits on input image dimensions and tensor sizes at the API boundary before reaching TensorFlow ops — reject requests exceeding practical thresholds (e.g., max 65535x65535).

  3. Process isolation: Run inference workers in separate processes with supervisor restart (e.g., systemd, Kubernetes liveness probes) to auto-recover from crashes.

  4. Detection: Alert on abnormal process termination (CHECK-failure messages in TF logs) and anomalous input sizes in preprocessing telemetry.

  5. Audit: Identify all services and pipelines pinned to affected TF versions (2.4.x–2.6.x) and prioritize their upgrade.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2.6 - AI system availability and resilience
NIST AI RMF
MANAGE-2.2 - Mechanisms to sustain the value of deployed AI systems are evaluated and in place MAP-5.1 - Likelihood and magnitude of each identified impact based on exposure
OWASP LLM Top 10
LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2021-41199?

A low-privileged local attacker can crash any TensorFlow process by passing oversized inputs to tf.image.resize, triggering a CHECK-failure via int64 overflow. This is a pure availability impact — no data exposure — but it can disrupt image preprocessing pipelines, training jobs, and inference servers in shared ML environments. Patch to TF 2.7.0, 2.6.1, 2.5.2, or 2.4.4 immediately; add input dimension validation as a defense-in-depth measure.

Is CVE-2021-41199 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-41199, increasing the risk of exploitation.

How to fix CVE-2021-41199?

1. Patch: Upgrade TensorFlow to 2.7.0 or apply cherry-picks to 2.6.1 / 2.5.2 / 2.4.4 via commit e5272d4. 2. Input validation: Enforce hard limits on input image dimensions and tensor sizes at the API boundary before reaching TensorFlow ops — reject requests exceeding practical thresholds (e.g., max 65535x65535). 3. Process isolation: Run inference workers in separate processes with supervisor restart (e.g., systemd, Kubernetes liveness probes) to auto-recover from crashes. 4. Detection: Alert on abnormal process termination (CHECK-failure messages in TF logs) and anomalous input sizes in preprocessing telemetry. 5. Audit: Identify all services and pipelines pinned to affected TF versions (2.4.x–2.6.x) and prioritize their upgrade.

What systems are affected by CVE-2021-41199?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, image preprocessing pipelines, multi-tenant ML platforms, MLOps/TFX pipelines.

What is the CVSS score for CVE-2021-41199?

CVE-2021-41199 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.23%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingimage preprocessing pipelinesmulti-tenant ML platformsMLOps/TFX pipelines

MITRE ATLAS Techniques

AML.T0029 Denial of AI Service
AML.T0034 Cost Harvesting
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: A.6.2.6
NIST AI RMF: MANAGE-2.2, MAP-5.1
OWASP LLM Top 10: LLM04

What are the technical details?

Original Advisory

TensorFlow is an open source platform for machine learning. In affected versions if `tf.image.resize` is called with a large input argument then the TensorFlow process will crash due to a `CHECK`-failure caused by an overflow. The number of elements in the output tensor is too much for the `int64_t` type and the overflow is detected via a `CHECK` statement. This aborts the process. The fix will be included in TensorFlow 2.7.0. We will also cherrypick this commit on TensorFlow 2.6.1, TensorFlow 2.5.2, and TensorFlow 2.4.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with low-privilege access to a shared ML inference platform — such as a compromised internal account, a malicious notebook user, or an attacker who reached an exposed TF Serving endpoint — crafts a request with image dimensions large enough to cause int64_t overflow in the output element count computation (e.g., a 2M x 2M resize target). When tf.image.resize processes this input, the CHECK assertion fires and aborts the TensorFlow process. In a production model server handling concurrent inference requests, this terminates all in-flight predictions. In a continuous training pipeline, it halts the training job mid-epoch, requiring manual intervention. On a multi-tenant GPU cluster, a single malicious user can repeatedly crash shared TF workers, creating a sustained denial-of-service condition for all tenants.

Weaknesses (CWE)

CWE-190 — Integer Overflow or Wraparound: The product performs a calculation that can produce an integer overflow or wraparound when the logic assumes that the resulting value will always be larger than the original value. This occurs when an integer value is incremented to a value that is too large to store in the associated representation. When this occurs, the value may become a very small or negative number.

  • [Requirements] Ensure that all protocols are strictly defined, such that all out-of-bounds behavior can be identified simply, and require strict conformance to the protocol.
  • [Requirements] Use a language that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. If possible, choose a language or compiler that performs automatic bounds checking.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
November 5, 2021
Last Modified
November 21, 2024
First Seen
November 5, 2021

Related Vulnerabilities