CVE-2025-32444: vLLM: RCE via pickle deserialization on ZeroMQ
GHSA-hj4w-hm2g-p6w5 CRITICAL PoC AVAILABLE CISA: TRACK*Any vLLM deployment using the mooncake KV-transfer integration (versions 0.6.5–0.8.5) exposes an unauthenticated, network-accessible RCE primitive — patch to 0.8.5 immediately. If you cannot patch today, disable the mooncake integration or firewall the ZeroMQ ports at the network perimeter. This is inference-server-level compromise: an attacker who reaches those sockets owns the host, the model weights, and everything in the same network segment.
Risk Assessment
Severity is effectively maximum for affected deployments. CVSS 9.8 with no authentication, no user interaction, and low attack complexity means any network-reachable vLLM+mooncake instance is trivially exploitable. The sockets listen on 0.0.0.0 (all interfaces), so cloud-hosted inference endpoints, Kubernetes clusters with improperly scoped network policies, and any externally reachable GPU node are all in scope. EPSS of 2.5% is relatively low today, but the vulnerability is straightforward enough that weaponized exploits are a matter of days from public disclosure. No evidence of KEV listing yet, but the exploit surface is unusually clean.
Affected Systems
Severity & Risk
Attack Surface
Recommended Action
1 step-
1) Patch: upgrade to vLLM >= 0.8.5 — this is the only full remediation. 2) Workaround (if patching is blocked): disable the mooncake integration entirely via configuration; confirm ZeroMQ sockets are not bound. 3) Network control: enforce firewall rules or Kubernetes NetworkPolicies restricting ZeroMQ port access to only trusted inter-node IPs; block all external access to ZeroMQ ports (default range varies — identify with
ss -tlnp | grep zmqor check vLLM logs for bind addresses). 4) Detection: alert on unexpected outbound connections from inference hosts, unexpected processes spawned by the vLLM process, and anomalous network traffic on high-numbered TCP ports from GPU nodes. 5) Audit: inventory all vLLM deployments usingpip show vllmand confirm mooncake usage in config files before assuming you are not affected.
CISA SSVC Assessment
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2025-32444?
Any vLLM deployment using the mooncake KV-transfer integration (versions 0.6.5–0.8.5) exposes an unauthenticated, network-accessible RCE primitive — patch to 0.8.5 immediately. If you cannot patch today, disable the mooncake integration or firewall the ZeroMQ ports at the network perimeter. This is inference-server-level compromise: an attacker who reaches those sockets owns the host, the model weights, and everything in the same network segment.
Is CVE-2025-32444 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2025-32444, increasing the risk of exploitation.
How to fix CVE-2025-32444?
1) Patch: upgrade to vLLM >= 0.8.5 — this is the only full remediation. 2) Workaround (if patching is blocked): disable the mooncake integration entirely via configuration; confirm ZeroMQ sockets are not bound. 3) Network control: enforce firewall rules or Kubernetes NetworkPolicies restricting ZeroMQ port access to only trusted inter-node IPs; block all external access to ZeroMQ ports (default range varies — identify with `ss -tlnp | grep zmq` or check vLLM logs for bind addresses). 4) Detection: alert on unexpected outbound connections from inference hosts, unexpected processes spawned by the vLLM process, and anomalous network traffic on high-numbered TCP ports from GPU nodes. 5) Audit: inventory all vLLM deployments using `pip show vllm` and confirm mooncake usage in config files before assuming you are not affected.
What systems are affected by CVE-2025-32444?
This vulnerability affects the following AI/ML architecture patterns: Model serving / LLM inference, Distributed inference clusters, Multi-node GPU serving infrastructure, KV-cache offloading pipelines.
What is the CVSS score for CVE-2025-32444?
CVE-2025-32444 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 2.48%.
Technical Details
NVD Description
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.6.5 and prior to 0.8.5, having vLLM integration with mooncake, are vulnerable to remote code execution due to using pickle based serialization over unsecured ZeroMQ sockets. The vulnerable sockets were set to listen on all network interfaces, increasing the likelihood that an attacker is able to reach the vulnerable ZeroMQ sockets to carry out an attack. vLLM instances that do not make use of the mooncake integration are not vulnerable. This issue has been patched in version 0.8.5.
Exploitation Scenario
An attacker performs reconnaissance on an organization's AI inference infrastructure — either via external scanning (Shodan/Censys for ZeroMQ fingerprints) or after gaining foothold in the same VPC. They connect directly to the ZeroMQ socket bound on 0.0.0.0 on the vLLM inference node. They craft and send a malicious pickle payload encoding a Python object whose `__reduce__` method executes an arbitrary OS command. Because ZeroMQ is unauthenticated and pickle deserializes without validation, the command executes immediately in the context of the vLLM process — typically running with GPU access and broad filesystem permissions. From there, the attacker exfiltrates model weights, dumps environment variables containing cloud credentials or API keys, and establishes a reverse shell for persistent access to the inference cluster.
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H References
- github.com/advisories/GHSA-hj4w-hm2g-p6w5
- github.com/pypa/advisory-database/tree/main/vulns/vllm/PYSEC-2025-42.yaml
- nvd.nist.gov/vuln/detail/CVE-2025-32444
- github.com/vllm-project/vllm/blob/32b14baf8a1f7195ca09484de3008063569b43c5/vllm/distributed/kv_transfer/kv_pipe/mooncake_pipe.py Product
- github.com/vllm-project/vllm/commit/a5450f11c95847cf51a17207af9a3ca5ab569b2c Patch
- github.com/vllm-project/vllm/security/advisories/GHSA-hj4w-hm2g-p6w5 Exploit Vendor
- github.com/vllm-project/vllm/security/advisories/GHSA-x3m8-f7g5-qhm7 Not Applicable
- github.com/nomi-sec/PoC-in-GitHub Exploit
- github.com/stuxbench/vLLM-CVE-2025-32444 Exploit
- github.com/stuxbench/vllm-cve-2025-32444 Exploit
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2026-22807 9.8 vllm: Code Injection enables RCE
Same package: vllm
AI Threat Alert