CVE-2021-41199: TensorFlow: tf.image.resize integer overflow DoS

MEDIUM PoC AVAILABLE
Published November 5, 2021
CISO Take

A low-privileged local attacker can crash any TensorFlow process by passing oversized inputs to tf.image.resize, triggering a CHECK-failure via int64 overflow. This is a pure availability impact — no data exposure — but it can disrupt image preprocessing pipelines, training jobs, and inference servers in shared ML environments. Patch to TF 2.7.0, 2.6.1, 2.5.2, or 2.4.4 immediately; add input dimension validation as a defense-in-depth measure.

Risk Assessment

Practical risk is moderate-low for most deployments. The local attack vector limits internet-facing exposure, but shared GPU clusters, multi-tenant ML platforms, and Jupyter notebook environments reduce this barrier significantly. Exploitability is trivial — no special knowledge required, just a crafted input. With CVSS availability impact rated HIGH, any TensorFlow-based production inference service handling user-controlled images is at real risk of disruption. The absence from CISA KEV and no known active exploitation in the wild keeps this from being urgent-critical.

Affected Systems

Package Ecosystem Vulnerable Range Patched
tensorflow pip No patch
195.0K OpenSSF 7.2 3.7K dependents Pushed 6d ago 4% patched ~1372d to patch Full package profile →

Do you use tensorflow? You're affected.

Severity & Risk

CVSS 3.1
5.5 / 10
EPSS
0.0%
chance of exploitation in 30 days
Higher than 15% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

Recommended Action

5 steps
  1. Patch: Upgrade TensorFlow to 2.7.0 or apply cherry-picks to 2.6.1 / 2.5.2 / 2.4.4 via commit e5272d4.

  2. Input validation: Enforce hard limits on input image dimensions and tensor sizes at the API boundary before reaching TensorFlow ops — reject requests exceeding practical thresholds (e.g., max 65535x65535).

  3. Process isolation: Run inference workers in separate processes with supervisor restart (e.g., systemd, Kubernetes liveness probes) to auto-recover from crashes.

  4. Detection: Alert on abnormal process termination (CHECK-failure messages in TF logs) and anomalous input sizes in preprocessing telemetry.

  5. Audit: Identify all services and pipelines pinned to affected TF versions (2.4.x–2.6.x) and prioritize their upgrade.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2.6 - AI system availability and resilience
NIST AI RMF
MANAGE-2.2 - Mechanisms to sustain the value of deployed AI systems are evaluated and in place MAP-5.1 - Likelihood and magnitude of each identified impact based on exposure
OWASP LLM Top 10
LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2021-41199?

A low-privileged local attacker can crash any TensorFlow process by passing oversized inputs to tf.image.resize, triggering a CHECK-failure via int64 overflow. This is a pure availability impact — no data exposure — but it can disrupt image preprocessing pipelines, training jobs, and inference servers in shared ML environments. Patch to TF 2.7.0, 2.6.1, 2.5.2, or 2.4.4 immediately; add input dimension validation as a defense-in-depth measure.

Is CVE-2021-41199 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-41199, increasing the risk of exploitation.

How to fix CVE-2021-41199?

1. Patch: Upgrade TensorFlow to 2.7.0 or apply cherry-picks to 2.6.1 / 2.5.2 / 2.4.4 via commit e5272d4. 2. Input validation: Enforce hard limits on input image dimensions and tensor sizes at the API boundary before reaching TensorFlow ops — reject requests exceeding practical thresholds (e.g., max 65535x65535). 3. Process isolation: Run inference workers in separate processes with supervisor restart (e.g., systemd, Kubernetes liveness probes) to auto-recover from crashes. 4. Detection: Alert on abnormal process termination (CHECK-failure messages in TF logs) and anomalous input sizes in preprocessing telemetry. 5. Audit: Identify all services and pipelines pinned to affected TF versions (2.4.x–2.6.x) and prioritize their upgrade.

What systems are affected by CVE-2021-41199?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, image preprocessing pipelines, multi-tenant ML platforms, MLOps/TFX pipelines.

What is the CVSS score for CVE-2021-41199?

CVE-2021-41199 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.05%.

Technical Details

NVD Description

TensorFlow is an open source platform for machine learning. In affected versions if `tf.image.resize` is called with a large input argument then the TensorFlow process will crash due to a `CHECK`-failure caused by an overflow. The number of elements in the output tensor is too much for the `int64_t` type and the overflow is detected via a `CHECK` statement. This aborts the process. The fix will be included in TensorFlow 2.7.0. We will also cherrypick this commit on TensorFlow 2.6.1, TensorFlow 2.5.2, and TensorFlow 2.4.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with low-privilege access to a shared ML inference platform — such as a compromised internal account, a malicious notebook user, or an attacker who reached an exposed TF Serving endpoint — crafts a request with image dimensions large enough to cause int64_t overflow in the output element count computation (e.g., a 2M x 2M resize target). When tf.image.resize processes this input, the CHECK assertion fires and aborts the TensorFlow process. In a production model server handling concurrent inference requests, this terminates all in-flight predictions. In a continuous training pipeline, it halts the training job mid-epoch, requiring manual intervention. On a multi-tenant GPU cluster, a single malicious user can repeatedly crash shared TF workers, creating a sustained denial-of-service condition for all tenants.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
November 5, 2021
Last Modified
November 21, 2024
First Seen
November 5, 2021

Related Vulnerabilities