CVE-2022-21732: TensorFlow: ThreadPoolHandle DoS via memory exhaustion

MEDIUM PoC AVAILABLE
Published February 3, 2022
CISO Take

Any TensorFlow deployment where authenticated users can influence thread pool parameters is at risk of intentional resource exhaustion and service crash. The fix is straightforward: patch to TF 2.8.0, 2.7.1, 2.6.3, or 2.5.3. Prioritize shared training clusters and TF-serving instances accessible to multiple users or external inputs.

What is the risk?

CVSS 6.5 Medium understates operational risk in multi-tenant ML infrastructure. Network-accessible, low-complexity, low-privilege exploitation means any authenticated API consumer can trigger it reliably. No code execution or data exfiltration, but availability impact can cascade to downstream model serving SLAs and batch training pipelines. Risk is elevated in organizations running TensorFlow-as-a-service or shared GPU clusters.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
6.5 / 10
EPSS
0.8%
chance of exploitation in 30 days
Higher than 50% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

What should I do?

5 steps
  1. Patch immediately: upgrade to TensorFlow 2.8.0, or cherrypicked fixes in 2.7.1, 2.6.3, 2.5.3.

  2. Workaround if patching is blocked: enforce server-side validation of num_threads before passing to TF ops (cap at logical CPU count * reasonable_multiplier).

  3. In multi-tenant environments, restrict direct access to low-level tf.data experimental APIs via API gateway input sanitization.

  4. Detection: alert on OOM events in TF processes correlated with num_threads values exceeding vCPU count * 10x.

  5. Audit any user-supplied integer parameters flowing into TF data pipeline constructors.

What does CISA's SSVC say?

Decision Track
Exploitation none
Automatable No
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Art. 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.9.2 - AI system availability and resilience
NIST AI RMF
MANAGE 2.2 - Mechanisms are in place to sustain reliable AI system operation
OWASP LLM Top 10
LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2022-21732?

Any TensorFlow deployment where authenticated users can influence thread pool parameters is at risk of intentional resource exhaustion and service crash. The fix is straightforward: patch to TF 2.8.0, 2.7.1, 2.6.3, or 2.5.3. Prioritize shared training clusters and TF-serving instances accessible to multiple users or external inputs.

Is CVE-2022-21732 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2022-21732, increasing the risk of exploitation.

How to fix CVE-2022-21732?

1. Patch immediately: upgrade to TensorFlow 2.8.0, or cherrypicked fixes in 2.7.1, 2.6.3, 2.5.3. 2. Workaround if patching is blocked: enforce server-side validation of num_threads before passing to TF ops (cap at logical CPU count * reasonable_multiplier). 3. In multi-tenant environments, restrict direct access to low-level tf.data experimental APIs via API gateway input sanitization. 4. Detection: alert on OOM events in TF processes correlated with num_threads values exceeding vCPU count * 10x. 5. Audit any user-supplied integer parameters flowing into TF data pipeline constructors.

What systems are affected by CVE-2022-21732?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, data preprocessing pipelines, shared ML compute clusters.

What is the CVSS score for CVE-2022-21732?

CVE-2022-21732 has a CVSS v3.1 base score of 6.5 (MEDIUM). The EPSS exploitation probability is 0.75%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingdata preprocessing pipelinesshared ML compute clusters

MITRE ATLAS Techniques

AML.T0029 Denial of AI Service
AML.T0034 Cost Harvesting
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art. 15
ISO 42001: A.9.2
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM04

What are the technical details?

Original Advisory

Tensorflow is an Open Source Machine Learning Framework. The implementation of `ThreadPoolHandle` can be used to trigger a denial of service attack by allocating too much memory. This is because the `num_threads` argument is only checked to not be negative, but there is no upper bound on its value. The fix will be included in TensorFlow 2.8.0. We will also cherrypick this commit on TensorFlow 2.7.1, TensorFlow 2.6.3, and TensorFlow 2.5.3, as these are also affected and still in supported range.

Exploitation Scenario

Adversary with low-privilege access to a shared TensorFlow training platform or model-serving API submits a crafted tf.data pipeline definition embedding a ThreadPoolHandle with num_threads set to a value near INT_MAX (2147483647). TensorFlow validates only that the value is non-negative and proceeds to allocate memory proportional to the thread count. The process exhausts available RAM within seconds, triggering an OOM kill. In a Kubernetes-managed serving cluster, this crashes the pod; in a shared training cluster, it kills the worker process and may corrupt in-progress checkpoints. Attack requires only valid API credentials and knowledge of the tf.data API surface — both publicly documented.

Weaknesses (CWE)

CWE-770 — Allocation of Resources Without Limits or Throttling: The product allocates a reusable resource or group of resources on behalf of an actor without imposing any intended restrictions on the size or number of resources that can be allocated.

  • [Requirements] Clearly specify the minimum and maximum expectations for capabilities, and dictate which behaviors are acceptable when resource allocation reaches limits.
  • [Architecture and Design] Limit the amount of resources that are accessible to unprivileged users. Set per-user limits for resources. Allow the system administrator to define these limits. Be careful to avoid CWE-410.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
February 3, 2022
Last Modified
November 21, 2024
First Seen
February 3, 2022

Related Vulnerabilities