CVE-2025-24357: vLLM: unsafe deserialization RCE via model loading

GHSA-rh4j-5rhw-hr54 HIGH
Published January 27, 2025
CISO Take

vLLM's model loading function uses torch.load with pickle deserialization enabled by default, allowing arbitrary code execution when loading a malicious model checkpoint. Any organization running vLLM < 0.7.0 to serve LLMs—especially pulling models from HuggingFace—is at risk of full inference server compromise. Upgrade to vLLM >= 0.7.0 immediately; if that's not possible, restrict model sources to internally verified checkpoints only.

What is the risk?

High risk for production AI/ML environments. CVSS 8.8 reflects a low-complexity network attack requiring only user interaction (loading a model file). Pickle deserialization RCE is a well-understood, trivially weaponizable class of vulnerability—no AI/ML expertise required to craft a payload. The real danger is the supply chain vector: ML engineers routinely download and load models from HuggingFace without security scrutiny, normalizing exactly the behavior this vulnerability exploits. EPSS at 1% suggests no active exploitation yet, but the attack surface is broad given vLLM's widespread adoption in LLM serving infrastructure.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
vLLM pip No patch
83.4K 130 dependents Pushed 3d ago 34% patched ~32d to patch Full package profile →
vLLM pip < 0.7.0 0.7.0
83.4K 130 dependents Pushed 3d ago 34% patched ~32d to patch Full package profile →

How severe is it?

CVSS 3.1
8.8 / 10
EPSS
0.6%
chance of exploitation in 30 days
Higher than 46% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Trivial

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI Required
S Unchanged
C High
I High
A High

What should I do?

5 steps
  1. PATCH

    Upgrade vLLM to >= 0.7.0 immediately (fix sets weights_only=True in torch.load, removing pickle execution path).

  2. INVENTORY

    Identify all vLLM instances and their current versions across dev, staging, and production.

  3. INTERIM WORKAROUND

    If patching is delayed, restrict model loading to internally hosted, cryptographically signed checkpoints—block external HuggingFace downloads at the network level.

  4. DETECTION

    Monitor for unexpected process spawning, outbound connections, or file writes originating from vLLM inference processes. Flag torch.load calls with weights_only=False in CI/CD code scanning.

  5. DEFENSE-IN-DEPTH: Run inference servers with least-privilege service accounts, network-isolated from sensitive internal systems, with egress filtering enabled on ML infrastructure.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 9 - Risk management system
ISO 42001
A.6.1.6 - AI supply chain management
NIST AI RMF
GOVERN 6.1 - Third-party AI risk policies
OWASP LLM Top 10
LLM05:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2025-24357?

vLLM's model loading function uses torch.load with pickle deserialization enabled by default, allowing arbitrary code execution when loading a malicious model checkpoint. Any organization running vLLM < 0.7.0 to serve LLMs—especially pulling models from HuggingFace—is at risk of full inference server compromise. Upgrade to vLLM >= 0.7.0 immediately; if that's not possible, restrict model sources to internally verified checkpoints only.

Is CVE-2025-24357 actively exploited?

No confirmed active exploitation of CVE-2025-24357 has been reported, but organizations should still patch proactively.

How to fix CVE-2025-24357?

1. PATCH: Upgrade vLLM to >= 0.7.0 immediately (fix sets weights_only=True in torch.load, removing pickle execution path). 2. INVENTORY: Identify all vLLM instances and their current versions across dev, staging, and production. 3. INTERIM WORKAROUND: If patching is delayed, restrict model loading to internally hosted, cryptographically signed checkpoints—block external HuggingFace downloads at the network level. 4. DETECTION: Monitor for unexpected process spawning, outbound connections, or file writes originating from vLLM inference processes. Flag torch.load calls with weights_only=False in CI/CD code scanning. 5. DEFENSE-IN-DEPTH: Run inference servers with least-privilege service accounts, network-isolated from sensitive internal systems, with egress filtering enabled on ML infrastructure.

What systems are affected by CVE-2025-24357?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, model serving, model loading pipelines, AI/ML supply chain, on-premises LLM deployments.

What is the CVSS score for CVE-2025-24357?

CVE-2025-24357 has a CVSS v3.1 base score of 8.8 (HIGH). The EPSS exploitation probability is 0.65%.

What is the AI security impact?

Affected AI Architectures

LLM inference servingmodel servingmodel loading pipelinesAI/ML supply chainon-premises LLM deployments

MITRE ATLAS Techniques

AML.T0002.001 Models
AML.T0010.003 Model
AML.T0011.000 Unsafe AI Artifacts
AML.T0018.002 Embed Malware
AML.T0058 Publish Poisoned Models

Compliance Controls Affected

EU AI Act: Article 9
ISO 42001: A.6.1.6
NIST AI RMF: GOVERN 6.1
OWASP LLM Top 10: LLM05:2025

What are the technical details?

Original Advisory

vLLM is a library for LLM inference and serving. vllm/model_executor/weight_utils.py implements hf_model_weights_iterator to load the model checkpoint, which is downloaded from huggingface. It uses the torch.load function and the weights_only parameter defaults to False. When torch.load loads malicious pickle data, it will execute arbitrary code during unpickling. This vulnerability is fixed in v0.7.0.

Exploitation Scenario

An adversary creates a malicious model checkpoint embedding a Python pickle payload that executes a reverse shell or downloads a backdoor. They publish it to HuggingFace as a popular model variant—for example, a quantized version of a widely-used open-source LLM—with a convincing model card. An ML engineer at a target organization loads this model into vLLM for evaluation or production serving. When vLLM calls torch.load() with weights_only=False, the pickle payload executes automatically during deserialization, granting the attacker a shell on the inference server. From there, they exfiltrate API keys from environment variables, steal proprietary fine-tuned model weights, pivot to internal network segments, or establish persistent access on GPU infrastructure.

Weaknesses (CWE)

CWE-502 — Deserialization of Untrusted Data: The product deserializes untrusted data without sufficiently ensuring that the resulting data will be valid.

  • [Architecture and Design, Implementation] If available, use the signing/sealing features of the programming language to assure that deserialized data has not been tainted. For example, a hash-based message authentication code (HMAC) could be used to ensure that data has not been modified.
  • [Implementation] When deserializing data, populate a new object rather than just deserializing. The result is that the data flows through safe input validation and that the functions are safe.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

Timeline

Published
January 27, 2025
Last Modified
June 30, 2025
First Seen
January 27, 2025

Related Vulnerabilities