vLLM's model loading function uses torch.load with pickle deserialization enabled by default, allowing arbitrary code execution when loading a malicious model checkpoint. Any organization running vLLM < 0.7.0 to serve LLMs—especially pulling models from HuggingFace—is at risk of full inference server compromise. Upgrade to vLLM >= 0.7.0 immediately; if that's not possible, restrict model sources to internally verified checkpoints only.
Risk Assessment
High risk for production AI/ML environments. CVSS 8.8 reflects a low-complexity network attack requiring only user interaction (loading a model file). Pickle deserialization RCE is a well-understood, trivially weaponizable class of vulnerability—no AI/ML expertise required to craft a payload. The real danger is the supply chain vector: ML engineers routinely download and load models from HuggingFace without security scrutiny, normalizing exactly the behavior this vulnerability exploits. EPSS at 1% suggests no active exploitation yet, but the attack surface is broad given vLLM's widespread adoption in LLM serving infrastructure.
Affected Systems
Severity & Risk
Attack Surface
Recommended Action
5 steps-
PATCH
Upgrade vLLM to >= 0.7.0 immediately (fix sets weights_only=True in torch.load, removing pickle execution path).
-
INVENTORY
Identify all vLLM instances and their current versions across dev, staging, and production.
-
INTERIM WORKAROUND
If patching is delayed, restrict model loading to internally hosted, cryptographically signed checkpoints—block external HuggingFace downloads at the network level.
-
DETECTION
Monitor for unexpected process spawning, outbound connections, or file writes originating from vLLM inference processes. Flag torch.load calls with weights_only=False in CI/CD code scanning.
-
DEFENSE-IN-DEPTH: Run inference servers with least-privilege service accounts, network-isolated from sensitive internal systems, with egress filtering enabled on ML infrastructure.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2025-24357?
vLLM's model loading function uses torch.load with pickle deserialization enabled by default, allowing arbitrary code execution when loading a malicious model checkpoint. Any organization running vLLM < 0.7.0 to serve LLMs—especially pulling models from HuggingFace—is at risk of full inference server compromise. Upgrade to vLLM >= 0.7.0 immediately; if that's not possible, restrict model sources to internally verified checkpoints only.
Is CVE-2025-24357 actively exploited?
No confirmed active exploitation of CVE-2025-24357 has been reported, but organizations should still patch proactively.
How to fix CVE-2025-24357?
1. PATCH: Upgrade vLLM to >= 0.7.0 immediately (fix sets weights_only=True in torch.load, removing pickle execution path). 2. INVENTORY: Identify all vLLM instances and their current versions across dev, staging, and production. 3. INTERIM WORKAROUND: If patching is delayed, restrict model loading to internally hosted, cryptographically signed checkpoints—block external HuggingFace downloads at the network level. 4. DETECTION: Monitor for unexpected process spawning, outbound connections, or file writes originating from vLLM inference processes. Flag torch.load calls with weights_only=False in CI/CD code scanning. 5. DEFENSE-IN-DEPTH: Run inference servers with least-privilege service accounts, network-isolated from sensitive internal systems, with egress filtering enabled on ML infrastructure.
What systems are affected by CVE-2025-24357?
This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, model serving, model loading pipelines, AI/ML supply chain, on-premises LLM deployments.
What is the CVSS score for CVE-2025-24357?
CVE-2025-24357 has a CVSS v3.1 base score of 8.8 (HIGH). The EPSS exploitation probability is 1.01%.
Technical Details
NVD Description
vLLM is a library for LLM inference and serving. vllm/model_executor/weight_utils.py implements hf_model_weights_iterator to load the model checkpoint, which is downloaded from huggingface. It uses the torch.load function and the weights_only parameter defaults to False. When torch.load loads malicious pickle data, it will execute arbitrary code during unpickling. This vulnerability is fixed in v0.7.0.
Exploitation Scenario
An adversary creates a malicious model checkpoint embedding a Python pickle payload that executes a reverse shell or downloads a backdoor. They publish it to HuggingFace as a popular model variant—for example, a quantized version of a widely-used open-source LLM—with a convincing model card. An ML engineer at a target organization loads this model into vLLM for evaluation or production serving. When vLLM calls torch.load() with weights_only=False, the pickle payload executes automatically during deserialization, granting the attacker a shell on the inference server. From there, they exfiltrate API keys from environment variables, steal proprietary fine-tuned model weights, pivot to internal network segments, or establish persistent access on GPU infrastructure.
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H References
- github.com/advisories/GHSA-rh4j-5rhw-hr54
- github.com/pypa/advisory-database/tree/main/vulns/vllm/PYSEC-2025-58.yaml
- github.com/vllm-project/vllm/releases/tag/v0.7.0
- nvd.nist.gov/vuln/detail/CVE-2025-24357
- github.com/vllm-project/vllm/commit/d3d6bb13fb62da3234addf6574922a4ec0513d04 Patch
- github.com/vllm-project/vllm/pull/12366 Issue Patch
- github.com/vllm-project/vllm/security/advisories/GHSA-rh4j-5rhw-hr54 Vendor
- pytorch.org/docs/stable/generated/torch.load.html Technical
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm
AI Threat Alert