CVE-2025-33244: Deserialization enables RCE

Q: Is CVE-2025-33244 actively exploited?

No confirmed active exploitation of CVE-2025-33244 has been reported, but organizations should still patch proactively.

Q: How to fix CVE-2025-33244?

Immediate (0-24h): (1) Audit all NVIDIA APEX deployments — inventory environments with PyTorch < 2.6 via 'pip show torch' across all ML nodes and containers. (2) Block unnecessary lateral traffic to/from GPU training nodes at the network layer. Short-term (24-72h): (3) Upgrade PyTorch to 2.6+ in all training environments — this is the primary mitigation per NVIDIA advisory. (4) Update NVIDIA APEX to latest release from GitHub (nvidia/apex). (5) Restrict deserialization of external checkpoint files — validate sources and use cryptographic signing for model checkpoints (torch.save with signatures). Detection: (6) Alert on unusual processes spawned from Python/APEX worker processes, unexpected outbound connections from training nodes, anomalous filesystem writes in model checkpoint directories. Longer-term: (7) Implement network segmentation isolating GPU training clusters from broader corporate network; (8) Enforce image scanning in CI/CD that validates PyTorch version before deploying training jobs.

Q: What systems are affected by CVE-2025-33244?

This vulnerability affects the following AI/ML architecture patterns: distributed training pipelines, GPU cluster environments, MLOps platforms, model fine-tuning infrastructure, checkpoint storage and management systems.

Q: What is the CVSS score for CVE-2025-33244?

CVE-2025-33244 has a CVSS v3.1 base score of 9.0 (CRITICAL). The EPSS exploitation probability is 0.58%.

CISO Take

If your ML teams run distributed training or fine-tuning with NVIDIA APEX on PyTorch < 2.6, you have a CVSS 9.0 deserialization RCE sitting on your GPU cluster — patch PyTorch to 2.6+ immediately. Adjacent-network attack vector means anyone on the same VPC, shared GPU cluster, or corporate LAN can exploit this with low privileges and zero user interaction. The blast radius is your entire training infrastructure: model weights, training data, and GPU credentials.

What is the risk?

Critical risk for organizations with active ML training infrastructure. CVSS 9.0 with Scope:Changed means a successful exploit crosses from the APEX process into the broader system. Low attack complexity and no user interaction make this trivially automatable. Adjacent-network vector is a partial mitigating factor for internet-exposed systems, but cloud VPCs, Kubernetes GPU clusters, and shared HPC environments effectively collapse that boundary — any co-tenant or compromised adjacent workload is a viable attacker position. No active exploitation confirmed yet, but deserialization exploits (CWE-502) have a mature toolset and short weaponization timeline.

How severe is it?

CVSS 3.1

9.0 / 10

EPSS

0.6%

chance of exploitation in 30 days

Higher than 43% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

No known exploitation

Sophistication

Moderate

What is the attack surface?

AV Adjacent

AC Low

PR Low

UI None

S Changed

C High

I High

A High

What should I do?

1 step

Immediate (0-24h): (1) Audit all NVIDIA APEX deployments — inventory environments with PyTorch < 2.6 via 'pip show torch' across all ML nodes and containers. (2) Block unnecessary lateral traffic to/from GPU training nodes at the network layer. Short-term (24-72h): (3) Upgrade PyTorch to 2.6+ in all training environments — this is the primary mitigation per NVIDIA advisory. (4) Update NVIDIA APEX to latest release from GitHub (nvidia/apex). (5) Restrict deserialization of external checkpoint files — validate sources and use cryptographic signing for model checkpoints (torch.save with signatures). Detection: (6) Alert on unusual processes spawned from Python/APEX worker processes, unexpected outbound connections from training nodes, anomalous filesystem writes in model checkpoint directories. Longer-term: (7) Implement network segmentation isolating GPU training clusters from broader corporate network; (8) Enforce image scanning in CI/CD that validates PyTorch version before deploying training jobs.

What does CISA's SSVC say?

Decision Track

Exploitation none

Automatable No

Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Code Execution Supply Chain Data Extraction Framework Training Data AML.T0010.001 - AI Software AML.T0018.002 - Embed Malware AML.T0020 - Poison Training Data AML.T0025 - Exfiltration via Cyber Means AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, Robustness and Cybersecurity

ISO 42001

A.6.2.5 - AI System Security A.8.2 - AI System Components and Suppliers

NIST AI RMF

GOVERN 4.2 - Organizational teams are committed to governance MANAGE 2.2 - Mechanisms to manage AI risks

OWASP LLM Top 10

LLM03 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2025-33244?

If your ML teams run distributed training or fine-tuning with NVIDIA APEX on PyTorch < 2.6, you have a CVSS 9.0 deserialization RCE sitting on your GPU cluster — patch PyTorch to 2.6+ immediately. Adjacent-network attack vector means anyone on the same VPC, shared GPU cluster, or corporate LAN can exploit this with low privileges and zero user interaction. The blast radius is your entire training infrastructure: model weights, training data, and GPU credentials.

Is CVE-2025-33244 actively exploited?

No confirmed active exploitation of CVE-2025-33244 has been reported, but organizations should still patch proactively.

How to fix CVE-2025-33244?

Immediate (0-24h): (1) Audit all NVIDIA APEX deployments — inventory environments with PyTorch < 2.6 via 'pip show torch' across all ML nodes and containers. (2) Block unnecessary lateral traffic to/from GPU training nodes at the network layer. Short-term (24-72h): (3) Upgrade PyTorch to 2.6+ in all training environments — this is the primary mitigation per NVIDIA advisory. (4) Update NVIDIA APEX to latest release from GitHub (nvidia/apex). (5) Restrict deserialization of external checkpoint files — validate sources and use cryptographic signing for model checkpoints (torch.save with signatures). Detection: (6) Alert on unusual processes spawned from Python/APEX worker processes, unexpected outbound connections from training nodes, anomalous filesystem writes in model checkpoint directories. Longer-term: (7) Implement network segmentation isolating GPU training clusters from broader corporate network; (8) Enforce image scanning in CI/CD that validates PyTorch version before deploying training jobs.

What systems are affected by CVE-2025-33244?

This vulnerability affects the following AI/ML architecture patterns: distributed training pipelines, GPU cluster environments, MLOps platforms, model fine-tuning infrastructure, checkpoint storage and management systems.

What is the CVSS score for CVE-2025-33244?

CVE-2025-33244 has a CVSS v3.1 base score of 9.0 (CRITICAL). The EPSS exploitation probability is 0.58%.

What is the AI security impact?

Affected AI Architectures

distributed training pipelinesGPU cluster environmentsMLOps platformsmodel fine-tuning infrastructurecheckpoint storage and management systems

MITRE ATLAS Techniques

AML.T0010.001 AI Software

AML.T0018.002 Embed Malware

AML.T0020 Poison Training Data

AML.T0025 Exfiltration via Cyber Means

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15

ISO 42001: A.6.2.5, A.8.2

NIST AI RMF: GOVERN 4.2, MANAGE 2.2

OWASP LLM Top 10: LLM03

What are the technical details?

Original Advisory

NVIDIA APEX for Linux contains a vulnerability where an unauthorized attacker could cause a deserialization of untrusted data. This vulnerability affects environments that use PyTorch versions earlier than 2.6. A successful exploit of this vulnerability might lead to code execution, denial of service, escalation of privileges, data tampering, and information disclosure.

Exploitation Scenario

An attacker with low-privilege access on the same network segment (e.g., a compromised ML workstation, a co-tenant in a cloud GPU cluster, or a malicious insider) crafts a serialized payload using Python's pickle deserialization primitives — a well-understood technique with public PoC templates. They deliver it to an APEX-enabled training process either by poisoning a shared model checkpoint repository (NFS mount, S3 bucket with permissive policies) or by injecting it directly over the network during distributed training communication (NCCL/GLOO). When the APEX process deserializes the payload on PyTorch < 2.6, arbitrary code executes in the training environment. The attacker establishes persistence, exfiltrates model weights and training data to external storage, and optionally poisons the model to introduce a backdoor — all while the training job appears to continue normally.

Weaknesses (CWE)

CWE-502 Deserialization of Untrusted Data Primary

CWE-502 — Deserialization of Untrusted Data: The product deserializes untrusted data without sufficiently ensuring that the resulting data will be valid.

[Architecture and Design, Implementation] If available, use the signing/sealing features of the programming language to assure that deserialized data has not been tainted. For example, a hash-based message authentication code (HMAC) could be used to ensure that data has not been modified.
[Implementation] When deserializing data, populate a new object rather than just deserializing. The result is that the data flows through safe input validation and that the functions are safe.

Source: MITRE CWE corpus.