CVE-2025-24357 — HIGH (CVSS 8.8) AI Security Vulnerability

CISO Take

vLLM's model loading function uses torch.load with pickle deserialization enabled by default, allowing arbitrary code execution when loading a malicious model checkpoint. Any organization running vLLM < 0.7.0 to serve LLMs—especially pulling models from HuggingFace—is at risk of full inference server compromise. Upgrade to vLLM >= 0.7.0 immediately; if that's not possible, restrict model sources to internally verified checkpoints only.

Risk Assessment

High risk for production AI/ML environments. CVSS 8.8 reflects a low-complexity network attack requiring only user interaction (loading a model file). Pickle deserialization RCE is a well-understood, trivially weaponizable class of vulnerability—no AI/ML expertise required to craft a payload. The real danger is the supply chain vector: ML engineers routinely download and load models from HuggingFace without security scrutiny, normalizing exactly the behavior this vulnerability exploits. EPSS at 1% suggests no active exploitation yet, but the attack surface is broad given vLLM's widespread adoption in LLM serving infrastructure.

Affected Systems

Package	Ecosystem	Vulnerable Range	Patched
vllm	pip	—	No patch
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →
vllm	pip	< 0.7.0	`0.7.0`
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →

Severity & Risk

CVSS 3.1

8.8 / 10

EPSS

1.0%

chance of exploitation in 30 days

Higher than 77% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

No known exploitation

Sophistication

Trivial

Attack Surface

AV Network

AC Low

PR None

UI Required

S Unchanged

C High

I High

A High

Recommended Action

5 steps

PATCH

Upgrade vLLM to >= 0.7.0 immediately (fix sets weights_only=True in torch.load, removing pickle execution path).
INVENTORY

Identify all vLLM instances and their current versions across dev, staging, and production.
INTERIM WORKAROUND

If patching is delayed, restrict model loading to internally hosted, cryptographically signed checkpoints—block external HuggingFace downloads at the network level.
DETECTION

Monitor for unexpected process spawning, outbound connections, or file writes originating from vLLM inference processes. Flag torch.load calls with weights_only=False in CI/CD code scanning.
DEFENSE-IN-DEPTH: Run inference servers with least-privilege service accounts, network-isolated from sensitive internal systems, with egress filtering enabled on ML infrastructure.

Classification

Supply Chain Code Execution Framework Inference Model AML.T0002.001 - Models AML.T0010.003 - Model AML.T0011.000 - Unsafe AI Artifacts AML.T0018.002 - Embed Malware AML.T0058 - Publish Poisoned Models

Compliance Impact

This CVE is relevant to:

EU AI Act

Article 9 - Risk management system

ISO 42001

A.6.1.6 - AI supply chain management

NIST AI RMF

GOVERN 6.1 - Third-party AI risk policies

OWASP LLM Top 10

LLM05:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2025-24357?

vLLM's model loading function uses torch.load with pickle deserialization enabled by default, allowing arbitrary code execution when loading a malicious model checkpoint. Any organization running vLLM < 0.7.0 to serve LLMs—especially pulling models from HuggingFace—is at risk of full inference server compromise. Upgrade to vLLM >= 0.7.0 immediately; if that's not possible, restrict model sources to internally verified checkpoints only.

Is CVE-2025-24357 actively exploited?

No confirmed active exploitation of CVE-2025-24357 has been reported, but organizations should still patch proactively.

How to fix CVE-2025-24357?

1. PATCH: Upgrade vLLM to >= 0.7.0 immediately (fix sets weights_only=True in torch.load, removing pickle execution path). 2. INVENTORY: Identify all vLLM instances and their current versions across dev, staging, and production. 3. INTERIM WORKAROUND: If patching is delayed, restrict model loading to internally hosted, cryptographically signed checkpoints—block external HuggingFace downloads at the network level. 4. DETECTION: Monitor for unexpected process spawning, outbound connections, or file writes originating from vLLM inference processes. Flag torch.load calls with weights_only=False in CI/CD code scanning. 5. DEFENSE-IN-DEPTH: Run inference servers with least-privilege service accounts, network-isolated from sensitive internal systems, with egress filtering enabled on ML infrastructure.

What systems are affected by CVE-2025-24357?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, model serving, model loading pipelines, AI/ML supply chain, on-premises LLM deployments.

What is the CVSS score for CVE-2025-24357?

CVE-2025-24357 has a CVSS v3.1 base score of 8.8 (HIGH). The EPSS exploitation probability is 1.01%.

Technical Details

NVD Description

vLLM is a library for LLM inference and serving. vllm/model_executor/weight_utils.py implements hf_model_weights_iterator to load the model checkpoint, which is downloaded from huggingface. It uses the torch.load function and the weights_only parameter defaults to False. When torch.load loads malicious pickle data, it will execute arbitrary code during unpickling. This vulnerability is fixed in v0.7.0.

Exploitation Scenario

An adversary creates a malicious model checkpoint embedding a Python pickle payload that executes a reverse shell or downloads a backdoor. They publish it to HuggingFace as a popular model variant—for example, a quantized version of a widely-used open-source LLM—with a convincing model card. An ML engineer at a target organization loads this model into vLLM for evaluation or production serving. When vLLM calls torch.load() with weights_only=False, the pickle payload executes automatically during deserialization, granting the attacker a shell on the inference server. From there, they exfiltrate API keys from environment variables, steal proprietary fine-tuned model weights, pivot to internal network segments, or establish persistent access on GPU infrastructure.