vLLM's model loading function uses torch.load with pickle deserialization enabled by default, allowing arbitrary code execution when loading a malicious model checkpoint. Any organization running vLLM < 0.7.0 to serve LLMs—especially pulling models from HuggingFace—is at risk of full inference server compromise. Upgrade to vLLM >= 0.7.0 immediately; if that's not possible, restrict model sources to internally verified checkpoints only.
What is the risk?
High risk for production AI/ML environments. CVSS 8.8 reflects a low-complexity network attack requiring only user interaction (loading a model file). Pickle deserialization RCE is a well-understood, trivially weaponizable class of vulnerability—no AI/ML expertise required to craft a payload. The real danger is the supply chain vector: ML engineers routinely download and load models from HuggingFace without security scrutiny, normalizing exactly the behavior this vulnerability exploits. EPSS at 1% suggests no active exploitation yet, but the attack surface is broad given vLLM's widespread adoption in LLM serving infrastructure.
What systems are affected?
How severe is it?
What is the attack surface?
What should I do?
5 steps-
PATCH
Upgrade vLLM to >= 0.7.0 immediately (fix sets weights_only=True in torch.load, removing pickle execution path).
-
INVENTORY
Identify all vLLM instances and their current versions across dev, staging, and production.
-
INTERIM WORKAROUND
If patching is delayed, restrict model loading to internally hosted, cryptographically signed checkpoints—block external HuggingFace downloads at the network level.
-
DETECTION
Monitor for unexpected process spawning, outbound connections, or file writes originating from vLLM inference processes. Flag torch.load calls with weights_only=False in CI/CD code scanning.
-
DEFENSE-IN-DEPTH: Run inference servers with least-privilege service accounts, network-isolated from sensitive internal systems, with egress filtering enabled on ML infrastructure.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2025-24357?
vLLM's model loading function uses torch.load with pickle deserialization enabled by default, allowing arbitrary code execution when loading a malicious model checkpoint. Any organization running vLLM < 0.7.0 to serve LLMs—especially pulling models from HuggingFace—is at risk of full inference server compromise. Upgrade to vLLM >= 0.7.0 immediately; if that's not possible, restrict model sources to internally verified checkpoints only.
Is CVE-2025-24357 actively exploited?
No confirmed active exploitation of CVE-2025-24357 has been reported, but organizations should still patch proactively.
How to fix CVE-2025-24357?
1. PATCH: Upgrade vLLM to >= 0.7.0 immediately (fix sets weights_only=True in torch.load, removing pickle execution path). 2. INVENTORY: Identify all vLLM instances and their current versions across dev, staging, and production. 3. INTERIM WORKAROUND: If patching is delayed, restrict model loading to internally hosted, cryptographically signed checkpoints—block external HuggingFace downloads at the network level. 4. DETECTION: Monitor for unexpected process spawning, outbound connections, or file writes originating from vLLM inference processes. Flag torch.load calls with weights_only=False in CI/CD code scanning. 5. DEFENSE-IN-DEPTH: Run inference servers with least-privilege service accounts, network-isolated from sensitive internal systems, with egress filtering enabled on ML infrastructure.
What systems are affected by CVE-2025-24357?
This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, model serving, model loading pipelines, AI/ML supply chain, on-premises LLM deployments.
What is the CVSS score for CVE-2025-24357?
CVE-2025-24357 has a CVSS v3.1 base score of 8.8 (HIGH). The EPSS exploitation probability is 0.65%.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0002.001 Models AML.T0010.003 Model AML.T0011.000 Unsafe AI Artifacts AML.T0018.002 Embed Malware AML.T0058 Publish Poisoned Models Compliance Controls Affected
What are the technical details?
Original Advisory
vLLM is a library for LLM inference and serving. vllm/model_executor/weight_utils.py implements hf_model_weights_iterator to load the model checkpoint, which is downloaded from huggingface. It uses the torch.load function and the weights_only parameter defaults to False. When torch.load loads malicious pickle data, it will execute arbitrary code during unpickling. This vulnerability is fixed in v0.7.0.
Exploitation Scenario
An adversary creates a malicious model checkpoint embedding a Python pickle payload that executes a reverse shell or downloads a backdoor. They publish it to HuggingFace as a popular model variant—for example, a quantized version of a widely-used open-source LLM—with a convincing model card. An ML engineer at a target organization loads this model into vLLM for evaluation or production serving. When vLLM calls torch.load() with weights_only=False, the pickle payload executes automatically during deserialization, granting the attacker a shell on the inference server. From there, they exfiltrate API keys from environment variables, steal proprietary fine-tuned model weights, pivot to internal network segments, or establish persistent access on GPU infrastructure.
Weaknesses (CWE)
CWE-502 — Deserialization of Untrusted Data: The product deserializes untrusted data without sufficiently ensuring that the resulting data will be valid.
- [Architecture and Design, Implementation] If available, use the signing/sealing features of the programming language to assure that deserialized data has not been tainted. For example, a hash-based message authentication code (HMAC) could be used to ensure that data has not been modified.
- [Implementation] When deserializing data, populate a new object rather than just deserializing. The result is that the data flows through safe input validation and that the functions are safe.
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H References
- github.com/advisories/GHSA-rh4j-5rhw-hr54
- github.com/pypa/advisory-database/tree/main/vulns/vllm/PYSEC-2025-58.yaml
- github.com/vllm-project/vllm/releases/tag/v0.7.0
- nvd.nist.gov/vuln/detail/CVE-2025-24357
- github.com/vllm-project/vllm/commit/d3d6bb13fb62da3234addf6574922a4ec0513d04 Patch
- github.com/vllm-project/vllm/pull/12366 Issue Patch
- github.com/vllm-project/vllm/security/advisories/GHSA-rh4j-5rhw-hr54 Vendor
- pytorch.org/docs/stable/generated/torch.load.html Technical
Timeline
Related Vulnerabilities
CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm