CVE-2021-29550: TensorFlow: FractionalAvgPool DoS via divide-by-zero

MEDIUM PoC AVAILABLE
Published May 14, 2021
CISO Take

A locally-exploitable divide-by-zero in TensorFlow's FractionalAvgPool op crashes the TF runtime when attacker-controlled tensor shapes cause output_size to reach zero. Exposure is limited to systems where untrusted users can submit TensorFlow operations (e.g., shared ML platforms, multi-tenant training environments). Patch immediately to TF 2.5.0 or the applicable backport (2.4.2, 2.3.3, 2.2.3, 2.1.4) and restrict who can submit raw TF ops.

What is the risk?

Effective risk is LOW-MEDIUM for most organizations. The local attack vector (AV:L) and low-privilege requirement mean this is not remotely exploitable without first gaining system access. In multi-tenant ML platforms or Jupyter-style environments where users can execute arbitrary TF ops, risk elevates to MEDIUM since a single malicious notebook cell can crash a shared inference or training node. No active exploitation reported; not in CISA KEV. CVSS 5.5 is accurate for isolated deployments but underestimates impact in shared-compute AI infrastructure.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 2d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
5.5 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 9% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

What should I do?

5 steps
  1. Patch

    Upgrade TensorFlow to 2.5.0 or backports 2.4.2 / 2.3.3 / 2.2.3 / 2.1.4 per commit 548b5eaf23685d86f722233d8fbc21d0a4aecb96. Run pip show tensorflow to confirm version.

  2. Validate inputs

    Add pre-op assertion that input_size[i] >= pooling_ratio[i] before invoking FractionalAvgPool; reject or sanitize out-of-range values at API boundaries.

  3. Restrict op access

    If running a shared ML platform, use TF op allowlists or sandboxed execution (e.g., TF Serving with input schema validation) to prevent raw op invocation by untrusted users.

  4. Detect

    Monitor for unexpected TF process crashes or segfaults in training/serving logs — repeated crashes on FractionalAvgPool inputs may indicate probing.

  5. Isolate jobs

    Run multi-tenant training jobs in separate containers/processes so a crash in one job cannot affect others.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
8.4 - AI system operation A.9.2 - AI system security
NIST AI RMF
GOVERN-6.1 - Policies and procedures are in place for AI risk management MANAGE-2.2 - Mechanisms are in place and applied to sustain the value of deployed AI systems
OWASP LLM Top 10
LLM07:2025 - System Prompt Leakage / Insecure Plugin Design

Frequently Asked Questions

What is CVE-2021-29550?

A locally-exploitable divide-by-zero in TensorFlow's FractionalAvgPool op crashes the TF runtime when attacker-controlled tensor shapes cause output_size to reach zero. Exposure is limited to systems where untrusted users can submit TensorFlow operations (e.g., shared ML platforms, multi-tenant training environments). Patch immediately to TF 2.5.0 or the applicable backport (2.4.2, 2.3.3, 2.2.3, 2.1.4) and restrict who can submit raw TF ops.

Is CVE-2021-29550 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29550, increasing the risk of exploitation.

How to fix CVE-2021-29550?

1. **Patch**: Upgrade TensorFlow to 2.5.0 or backports 2.4.2 / 2.3.3 / 2.2.3 / 2.1.4 per commit 548b5eaf23685d86f722233d8fbc21d0a4aecb96. Run `pip show tensorflow` to confirm version. 2. **Validate inputs**: Add pre-op assertion that `input_size[i] >= pooling_ratio[i]` before invoking FractionalAvgPool; reject or sanitize out-of-range values at API boundaries. 3. **Restrict op access**: If running a shared ML platform, use TF op allowlists or sandboxed execution (e.g., TF Serving with input schema validation) to prevent raw op invocation by untrusted users. 4. **Detect**: Monitor for unexpected TF process crashes or segfaults in training/serving logs — repeated crashes on FractionalAvgPool inputs may indicate probing. 5. **Isolate jobs**: Run multi-tenant training jobs in separate containers/processes so a crash in one job cannot affect others.

What systems are affected by CVE-2021-29550?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, shared ML platforms, inference servers.

What is the CVSS score for CVE-2021-29550?

CVE-2021-29550 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.19%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingshared ML platformsinference servers

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0029 Denial of AI Service
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: 8.4, A.9.2
NIST AI RMF: GOVERN-6.1, MANAGE-2.2
OWASP LLM Top 10: LLM07:2025

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. An attacker can cause a runtime division by zero error and denial of service in `tf.raw_ops.FractionalAvgPool`. This is because the implementation(https://github.com/tensorflow/tensorflow/blob/acc8ee69f5f46f92a3f1f11230f49c6ac266f10c/tensorflow/core/kernels/fractional_avg_pool_op.cc#L85-L89) computes a divisor quantity by dividing two user controlled values. The user controls the values of `input_size[i]` and `pooling_ratio_[i]` (via the `value.shape()` and `pooling_ratio` arguments). If the value in `input_size[i]` is smaller than the `pooling_ratio_[i]`, then the floor operation results in `output_size[i]` being 0. The `DCHECK_GT` line is a no-op outside of debug mode, so in released versions of TF this does not trigger. Later, these computed values are used as arguments(https://github.com/tensorflow/tensorflow/blob/acc8ee69f5f46f92a3f1f11230f49c6ac266f10c/tensorflow/core/kernels/fractional_avg_pool_op.cc#L96-L99) to `GeneratePoolingSequence`(https://github.com/tensorflow/tensorflow/blob/acc8ee69f5f46f92a3f1f11230f49c6ac266f10c/tensorflow/core/kernels/fractional_pool_common.cc#L100-L108). There, the first computation is a division in a modulo operation. Since `output_length` can be 0, this results in runtime crashing. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with access to a shared ML training platform (e.g., internal Jupyter Hub, MLflow Projects, or a model-as-a-service endpoint accepting custom model files) uploads or executes a model containing a FractionalAvgPool layer configured with `pooling_ratio > input_size` (e.g., input shape [1, 2, 2, 1] with pooling_ratio [1.0, 3.0, 3.0, 1.0]). When the operator executes, `output_size` computes to 0, the DCHECK is suppressed in release builds, and the subsequent modulo operation divides by zero — crashing the TF runtime. In a shared compute cluster this takes down the entire training node, causing a denial of service for all co-located jobs. An attacker could automate this to keep the platform continuously unavailable.

Weaknesses (CWE)

CWE-369 — Divide By Zero: The product divides a value by zero.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities