CVE-2025-24357: vLLM: unsafe deserialization RCE via model loading

GHSA-rh4j-5rhw-hr54 HIGH
Published January 27, 2025
CISO Take

vLLM's model loading function uses torch.load with pickle deserialization enabled by default, allowing arbitrary code execution when loading a malicious model checkpoint. Any organization running vLLM < 0.7.0 to serve LLMs—especially pulling models from HuggingFace—is at risk of full inference server compromise. Upgrade to vLLM >= 0.7.0 immediately; if that's not possible, restrict model sources to internally verified checkpoints only.

Risk Assessment

High risk for production AI/ML environments. CVSS 8.8 reflects a low-complexity network attack requiring only user interaction (loading a model file). Pickle deserialization RCE is a well-understood, trivially weaponizable class of vulnerability—no AI/ML expertise required to craft a payload. The real danger is the supply chain vector: ML engineers routinely download and load models from HuggingFace without security scrutiny, normalizing exactly the behavior this vulnerability exploits. EPSS at 1% suggests no active exploitation yet, but the attack surface is broad given vLLM's widespread adoption in LLM serving infrastructure.

Affected Systems

Package Ecosystem Vulnerable Range Patched
vllm pip No patch
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →
vllm pip < 0.7.0 0.7.0
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →

Severity & Risk

CVSS 3.1
8.8 / 10
EPSS
1.0%
chance of exploitation in 30 days
Higher than 77% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Trivial

Attack Surface

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI Required
S Unchanged
C High
I High
A High

Recommended Action

5 steps
  1. PATCH

    Upgrade vLLM to >= 0.7.0 immediately (fix sets weights_only=True in torch.load, removing pickle execution path).

  2. INVENTORY

    Identify all vLLM instances and their current versions across dev, staging, and production.

  3. INTERIM WORKAROUND

    If patching is delayed, restrict model loading to internally hosted, cryptographically signed checkpoints—block external HuggingFace downloads at the network level.

  4. DETECTION

    Monitor for unexpected process spawning, outbound connections, or file writes originating from vLLM inference processes. Flag torch.load calls with weights_only=False in CI/CD code scanning.

  5. DEFENSE-IN-DEPTH: Run inference servers with least-privilege service accounts, network-isolated from sensitive internal systems, with egress filtering enabled on ML infrastructure.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 9 - Risk management system
ISO 42001
A.6.1.6 - AI supply chain management
NIST AI RMF
GOVERN 6.1 - Third-party AI risk policies
OWASP LLM Top 10
LLM05:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2025-24357?

vLLM's model loading function uses torch.load with pickle deserialization enabled by default, allowing arbitrary code execution when loading a malicious model checkpoint. Any organization running vLLM < 0.7.0 to serve LLMs—especially pulling models from HuggingFace—is at risk of full inference server compromise. Upgrade to vLLM >= 0.7.0 immediately; if that's not possible, restrict model sources to internally verified checkpoints only.

Is CVE-2025-24357 actively exploited?

No confirmed active exploitation of CVE-2025-24357 has been reported, but organizations should still patch proactively.

How to fix CVE-2025-24357?

1. PATCH: Upgrade vLLM to >= 0.7.0 immediately (fix sets weights_only=True in torch.load, removing pickle execution path). 2. INVENTORY: Identify all vLLM instances and their current versions across dev, staging, and production. 3. INTERIM WORKAROUND: If patching is delayed, restrict model loading to internally hosted, cryptographically signed checkpoints—block external HuggingFace downloads at the network level. 4. DETECTION: Monitor for unexpected process spawning, outbound connections, or file writes originating from vLLM inference processes. Flag torch.load calls with weights_only=False in CI/CD code scanning. 5. DEFENSE-IN-DEPTH: Run inference servers with least-privilege service accounts, network-isolated from sensitive internal systems, with egress filtering enabled on ML infrastructure.

What systems are affected by CVE-2025-24357?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, model serving, model loading pipelines, AI/ML supply chain, on-premises LLM deployments.

What is the CVSS score for CVE-2025-24357?

CVE-2025-24357 has a CVSS v3.1 base score of 8.8 (HIGH). The EPSS exploitation probability is 1.01%.

Technical Details

NVD Description

vLLM is a library for LLM inference and serving. vllm/model_executor/weight_utils.py implements hf_model_weights_iterator to load the model checkpoint, which is downloaded from huggingface. It uses the torch.load function and the weights_only parameter defaults to False. When torch.load loads malicious pickle data, it will execute arbitrary code during unpickling. This vulnerability is fixed in v0.7.0.

Exploitation Scenario

An adversary creates a malicious model checkpoint embedding a Python pickle payload that executes a reverse shell or downloads a backdoor. They publish it to HuggingFace as a popular model variant—for example, a quantized version of a widely-used open-source LLM—with a convincing model card. An ML engineer at a target organization loads this model into vLLM for evaluation or production serving. When vLLM calls torch.load() with weights_only=False, the pickle payload executes automatically during deserialization, granting the attacker a shell on the inference server. From there, they exfiltrate API keys from environment variables, steal proprietary fine-tuned model weights, pivot to internal network segments, or establish persistent access on GPU infrastructure.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

Timeline

Published
January 27, 2025
Last Modified
June 30, 2025
First Seen
January 27, 2025

Related Vulnerabilities