CVE-2025-2953: PyTorch: DoS via mkldnn_max_pool2d resource leak

GHSA-3749-ghw9-m3mg MEDIUM PoC AVAILABLE CISA: TRACK*
Published March 30, 2025
CISO Take

Low-priority patching item for teams running PyTorch locally. This is a local-only DoS requiring existing low-privileged access, making it an insider threat or post-exploitation primitive rather than a remote attack surface. Update to torch 2.7.1-rc1 when stable; no emergency response required unless your ML workstations are shared or multi-tenant.

What is the risk?

Risk is LOW in most enterprise AI deployments. CVSS 5.5 with local attack vector means an adversary must already have a foothold on the target machine. EPSS of 0.00151 confirms negligible exploitation likelihood. The vulnerability's existence is disputed by maintainers, and PyTorch's own security policy frames it as expected behavior when executing untrusted model code. Elevated risk only in multi-tenant ML training environments or shared GPU clusters where workload isolation is weak.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
PyTorch pip No patch
100.9K OpenSSF 6.4 22.7K dependents Pushed 3d ago 11% patched ~216d to patch Full package profile →
PyTorch pip < 2.7.1-rc1 2.7.1-rc1
100.9K OpenSSF 6.4 22.7K dependents Pushed 3d ago 11% patched ~216d to patch Full package profile →

How severe is it?

CVSS 3.1
5.5 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 12% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
CISA SSVC: Public PoC
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

What should I do?

5 steps
  1. Patch: upgrade to torch>=2.7.1-rc1 when available in stable channel.

  2. Immediate workaround: disable MKLDNN via TORCH_BACKENDS_MKLDNN_ENABLED=0 or torch.backends.mkldnn.enabled=False if MKLDNN acceleration is not required.

  3. Model provenance: enforce signed/verified model artifacts per PyTorch security policy — never execute models from untrusted sources.

  4. Isolation: run inference and training jobs in containers or VMs with resource limits (cgroups) to bound DoS blast radius.

  5. Detection: monitor for abnormal process termination or OOM events in PyTorch worker processes.

What does CISA's SSVC say?

Decision Track*
Exploitation poc
Automatable No
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2 - AI system robustness and resilience
NIST AI RMF
MANAGE 4.1 - Risk Treatment — Residual Risk Response

Frequently Asked Questions

What is CVE-2025-2953?

Low-priority patching item for teams running PyTorch locally. This is a local-only DoS requiring existing low-privileged access, making it an insider threat or post-exploitation primitive rather than a remote attack surface. Update to torch 2.7.1-rc1 when stable; no emergency response required unless your ML workstations are shared or multi-tenant.

Is CVE-2025-2953 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-2953, increasing the risk of exploitation.

How to fix CVE-2025-2953?

1. Patch: upgrade to torch>=2.7.1-rc1 when available in stable channel. 2. Immediate workaround: disable MKLDNN via TORCH_BACKENDS_MKLDNN_ENABLED=0 or torch.backends.mkldnn.enabled=False if MKLDNN acceleration is not required. 3. Model provenance: enforce signed/verified model artifacts per PyTorch security policy — never execute models from untrusted sources. 4. Isolation: run inference and training jobs in containers or VMs with resource limits (cgroups) to bound DoS blast radius. 5. Detection: monitor for abnormal process termination or OOM events in PyTorch worker processes.

What systems are affected by CVE-2025-2953?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, batch inference.

What is the CVSS score for CVE-2025-2953?

CVE-2025-2953 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.22%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingbatch inference

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0011.000 Unsafe AI Artifacts
AML.T0029 Denial of AI Service

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: A.6.2
NIST AI RMF: MANAGE 4.1

What are the technical details?

Original Advisory

A vulnerability, which was classified as problematic, has been found in PyTorch 2.6.0+cu124. Affected by this issue is the function torch.mkldnn_max_pool2d. The manipulation leads to denial of service. An attack has to be approached locally. The exploit has been disclosed to the public and may be used. The real existence of this vulnerability is still doubted at the moment. The security policy of the project warns to use unknown models which might establish malicious effects.

Exploitation Scenario

An adversary with access to a shared ML training cluster (e.g., a data scientist with low-privileged shell access) crafts a PyTorch model or script that invokes torch.mkldnn_max_pool2d with malformed tensor dimensions or invalid parameters, triggering improper resource release and crashing the target process. In a multi-tenant GPU cluster, this could disrupt co-located training jobs. Alternatively, a supply chain scenario: a threat actor embeds malicious tensor operations in a publicly published model checkpoint on Hugging Face; a victim downloads and runs it locally, triggering the DoS during model initialization or forward pass.

Weaknesses (CWE)

CWE-404 — Improper Resource Shutdown or Release: The product does not release or incorrectly releases a resource before it is made available for re-use.

  • [Requirements] Use a language that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. For example, languages such as Java, Ruby, and Lisp perform automatic garbage collection that releases memory for objects that have been deallocated.
  • [Implementation] It is good practice to be responsible for freeing all resources you allocate and to be consistent with how and where you free memory in a function. If you allocate memory that you intend to free upon completion of the function, you must be sure to free the memory at all exit points for that function including error conditions.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
March 30, 2025
Last Modified
May 30, 2025
First Seen
March 30, 2025

Related Vulnerabilities