CVE-2025-63396: pytorch: security flaw enables exploitation

LOW PoC AVAILABLE
Published November 12, 2025
CISO Take

Low-severity local DoS in PyTorch's profiling subsystem (v2.5, v2.7.1) — not a production threat, but a real risk in shared ML compute environments where omitting profiler.stop() can crash or hang training jobs during teardown. Enforce the context manager pattern ('with torch.profiler.profile(...)') in all internal ML code immediately as a zero-cost workaround, and track the upstream fix on issue #156563 before deploying patched PyTorch to training infrastructure.

What is the risk?

Low risk overall. CVSS 3.3 with local attack vector and limited availability impact accurately captures the threat envelope. The vulnerability is only reachable in environments where profiling is active — typically development, performance-tuning, or automated hyperparameter search pipelines, not production inference. Risk escalates meaningfully in shared HPC clusters or multi-tenant ML platforms where a hanging profiler session can consume shared GPU/CPU resources and impact co-located workloads. No confidentiality or integrity impact. Not in CISA KEV, no known active exploitation.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
PyTorch pip No patch
100.9K OpenSSF 6.4 22.7K dependents Pushed 3d ago 11% patched ~216d to patch Full package profile →

Do you use PyTorch? You're affected.

How severe is it?

CVSS 3.1
3.3 / 10
EPSS
0.1%
chance of exploitation in 30 days
Higher than 2% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C None
I None
A Low

What should I do?

5 steps
  1. Immediate workaround (zero cost): Mandate use of torch.profiler.profile exclusively as a context manager ('with' statement) — this guarantees stop() is called on all exit paths including exceptions.

  2. Code audit: Scan codebase for torch.profiler.profile instantiated without 'with' — grep/semgrep rule: 'torch.profiler.profile(' not preceded by 'with'.

  3. CI gate: Add lint check to block merges with non-context-manager profiler usage.

  4. Patch: Upgrade PyTorch when a fix is confirmed via pytorch/pytorch#156563; prioritize training infrastructure over dev environments.

  5. Detection: Alert on training jobs stuck in finalization phase >5 minutes — signals potential profiler hang.

What does CISA's SSVC say?

Decision Track
Exploitation none
Automatable No
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2 - AI system lifecycle management A.9.4 - AI system resilience and robustness
NIST AI RMF
GOVERN 1.2 - Policies, processes and practices are in place to map, measure, and manage AI risks MANAGE 2.2 - Mechanisms are in place to address residual risks

Frequently Asked Questions

What is CVE-2025-63396?

Low-severity local DoS in PyTorch's profiling subsystem (v2.5, v2.7.1) — not a production threat, but a real risk in shared ML compute environments where omitting profiler.stop() can crash or hang training jobs during teardown. Enforce the context manager pattern ('with torch.profiler.profile(...)') in all internal ML code immediately as a zero-cost workaround, and track the upstream fix on issue #156563 before deploying patched PyTorch to training infrastructure.

Is CVE-2025-63396 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-63396, increasing the risk of exploitation.

How to fix CVE-2025-63396?

1. Immediate workaround (zero cost): Mandate use of torch.profiler.profile exclusively as a context manager ('with' statement) — this guarantees stop() is called on all exit paths including exceptions. 2. Code audit: Scan codebase for torch.profiler.profile instantiated without 'with' — grep/semgrep rule: 'torch.profiler.profile(' not preceded by 'with'. 3. CI gate: Add lint check to block merges with non-context-manager profiler usage. 4. Patch: Upgrade PyTorch when a fix is confirmed via pytorch/pytorch#156563; prioritize training infrastructure over dev environments. 5. Detection: Alert on training jobs stuck in finalization phase >5 minutes — signals potential profiler hang.

What systems are affected by CVE-2025-63396?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, MLOps platforms, shared ML compute clusters, model development environments.

What is the CVSS score for CVE-2025-63396?

CVE-2025-63396 has a CVSS v3.1 base score of 3.3 (LOW). The EPSS exploitation probability is 0.11%.

What is the AI security impact?

Affected AI Architectures

training pipelinesMLOps platformsshared ML compute clustersmodel development environments

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0011 User Execution
AML.T0029 Denial of AI Service

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: A.6.2, A.9.4
NIST AI RMF: GOVERN 1.2, MANAGE 2.2

What are the technical details?

Original Advisory

An issue was discovered in PyTorch v2.5 and v2.7.1. Omission of profiler.stop() can cause torch.profiler.profile (PythonTracer) to crash or hang during finalization, leading to a Denial of Service (DoS).

Exploitation Scenario

A malicious insider or compromised CI pipeline submits a training script to a shared ML repository with profiling enabled and no stop() call in the exception handler. When the automated training platform processes a specific dataset condition that triggers the exception path, the profiler hangs during Python finalization. On a shared GPU cluster with resource isolation gaps, this consumes the allocated node indefinitely, denying training capacity to other teams until the job is forcibly killed. More commonly exploited unintentionally: a developer forgets profiler cleanup in async training code, causing intermittent job failures that burn engineering time to debug.

Weaknesses (CWE)

CWE-667 — Improper Locking: The product does not properly acquire or release a lock on a resource, leading to unexpected resource state changes and behaviors.

  • [Implementation] Use industry standard APIs to implement locking mechanism.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:L

Timeline

Published
November 12, 2025
Last Modified
January 2, 2026
First Seen
November 12, 2025

Related Vulnerabilities