CVE-2021-37674: TensorFlow: DoS via MaxPoolGrad invalid tensor input

MEDIUM
Published August 12, 2021
CISO Take

An authenticated local user can crash TensorFlow training jobs by passing malformed tensors to MaxPoolGrad, triggering a segmentation fault. Patch to TF 2.6.0, 2.5.1, 2.4.3, or 2.3.4 immediately in shared ML compute environments (notebooks, training clusters). No remote exploitation or data exfiltration risk — availability is the sole concern.

What is the risk?

MEDIUM. Local attack vector significantly limits exposure, but multi-tenant ML platforms with low-privilege users face real availability risk. Exploitation is trivial — crafting malformed tensor inputs requires no ML expertise, only basic TF API knowledge. CVSS A:H means complete DoS of the affected TF process. Not in CISA KEV; no reported in-the-wild exploitation. Incomplete fix history (CVE-2021-29579) suggests this class of missing-validation issues warrants broader audit of TF raw ops.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
5.5 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 8% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Trivial

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

What should I do?

5 steps
  1. PATCH

    Upgrade to TensorFlow 2.6.0, 2.5.1, 2.4.3, or 2.3.4 — all contain the fix at commit 136b51f.

  2. WORKAROUND

    Validate tensor shapes and dtypes before passing to tf.raw_ops in multi-tenant environments; add input sanitization layers at job submission boundaries.

  3. ISOLATION

    Run ML training workloads in isolated containers with scoped permissions so a crash in one job does not cascade.

  4. DETECTION

    Monitor for SIGSEGV signals in TF training processes; anomalous crash rates in MaxPool-heavy CNN training are an indicator.

  5. AUDIT

    Review usage of other tf.raw_ops that accept tensor pairs (orig_input/orig_output pattern) for similar missing-validation issues.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.9.2 - AI system performance assessment
NIST AI RMF
MANAGE 2.2 - Residual risks identified and monitored
OWASP LLM Top 10
LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2021-37674?

An authenticated local user can crash TensorFlow training jobs by passing malformed tensors to MaxPoolGrad, triggering a segmentation fault. Patch to TF 2.6.0, 2.5.1, 2.4.3, or 2.3.4 immediately in shared ML compute environments (notebooks, training clusters). No remote exploitation or data exfiltration risk — availability is the sole concern.

Is CVE-2021-37674 actively exploited?

No confirmed active exploitation of CVE-2021-37674 has been reported, but organizations should still patch proactively.

How to fix CVE-2021-37674?

1. PATCH: Upgrade to TensorFlow 2.6.0, 2.5.1, 2.4.3, or 2.3.4 — all contain the fix at commit 136b51f. 2. WORKAROUND: Validate tensor shapes and dtypes before passing to tf.raw_ops in multi-tenant environments; add input sanitization layers at job submission boundaries. 3. ISOLATION: Run ML training workloads in isolated containers with scoped permissions so a crash in one job does not cascade. 4. DETECTION: Monitor for SIGSEGV signals in TF training processes; anomalous crash rates in MaxPool-heavy CNN training are an indicator. 5. AUDIT: Review usage of other tf.raw_ops that accept tensor pairs (orig_input/orig_output pattern) for similar missing-validation issues.

What systems are affected by CVE-2021-37674?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, ML notebook environments, model serving.

What is the CVSS score for CVE-2021-37674?

CVE-2021-37674 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.18%.

What is the AI security impact?

Affected AI Architectures

training pipelinesML notebook environmentsmodel serving

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0029 Denial of AI Service

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: A.9.2
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM04

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. In affected versions an attacker can trigger a denial of service via a segmentation fault in `tf.raw_ops.MaxPoolGrad` caused by missing validation. The [implementation](https://github.com/tensorflow/tensorflow/blob/460e000de3a83278fb00b61a16d161b1964f15f4/tensorflow/core/kernels/maxpooling_op.cc) misses some validation for the `orig_input` and `orig_output` tensors. The fixes for CVE-2021-29579 were incomplete. We have patched the issue in GitHub commit 136b51f10903e044308cf77117c0ed9871350475. The fix will be included in TensorFlow 2.6.0. We will also cherrypick this commit on TensorFlow 2.5.1, TensorFlow 2.4.3, and TensorFlow 2.3.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with low-privileged access to a shared ML compute cluster — e.g., a data scientist account on a Jupyter notebook server or a Kubeflow pipeline — submits a TensorFlow job calling tf.raw_ops.MaxPoolGrad with tensors where orig_input and orig_output have mismatched or invalid shapes. TensorFlow dereferences the invalid memory region, triggers a segfault, and the training process crashes. In a shared GPU cluster, this disrupts co-located high-priority model training runs. A malicious insider or compromised data scientist account could repeatedly crash production training jobs to delay model delivery or sabotage model releases.

Weaknesses (CWE)

CWE-1284 — Improper Validation of Specified Quantity in Input: The product receives input that is expected to specify a quantity (such as size or length), but it does not validate or incorrectly validates that the quantity has the required properties.

  • [Implementation] Assume all input is malicious. Use an "accept known good" input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does. When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, "boat" may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as "red" or "blue." Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code's environment changes. This can give attackers enough room to bypass the intended validation. However, denylis

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
August 12, 2021
Last Modified
November 21, 2024
First Seen
August 12, 2021

Related Vulnerabilities