CVE-2025-32444: vLLM: RCE via pickle deserialization on ZeroMQ
GHSA-hj4w-hm2g-p6w5 CRITICAL PoC AVAILABLE CISA: TRACK*Any vLLM deployment using the mooncake KV-transfer integration (versions 0.6.5–0.8.5) exposes an unauthenticated, network-accessible RCE primitive — patch to 0.8.5 immediately. If you cannot patch today, disable the mooncake integration or firewall the ZeroMQ ports at the network perimeter. This is inference-server-level compromise: an attacker who reaches those sockets owns the host, the model weights, and everything in the same network segment.
What is the risk?
Severity is effectively maximum for affected deployments. CVSS 9.8 with no authentication, no user interaction, and low attack complexity means any network-reachable vLLM+mooncake instance is trivially exploitable. The sockets listen on 0.0.0.0 (all interfaces), so cloud-hosted inference endpoints, Kubernetes clusters with improperly scoped network policies, and any externally reachable GPU node are all in scope. EPSS of 2.5% is relatively low today, but the vulnerability is straightforward enough that weaponized exploits are a matter of days from public disclosure. No evidence of KEV listing yet, but the exploit surface is unusually clean.
What systems are affected?
How severe is it?
What is the attack surface?
What should I do?
1 step-
1) Patch: upgrade to vLLM >= 0.8.5 — this is the only full remediation. 2) Workaround (if patching is blocked): disable the mooncake integration entirely via configuration; confirm ZeroMQ sockets are not bound. 3) Network control: enforce firewall rules or Kubernetes NetworkPolicies restricting ZeroMQ port access to only trusted inter-node IPs; block all external access to ZeroMQ ports (default range varies — identify with
ss -tlnp | grep zmqor check vLLM logs for bind addresses). 4) Detection: alert on unexpected outbound connections from inference hosts, unexpected processes spawned by the vLLM process, and anomalous network traffic on high-numbered TCP ports from GPU nodes. 5) Audit: inventory all vLLM deployments usingpip show vllmand confirm mooncake usage in config files before assuming you are not affected.
What does CISA's SSVC say?
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2025-32444?
Any vLLM deployment using the mooncake KV-transfer integration (versions 0.6.5–0.8.5) exposes an unauthenticated, network-accessible RCE primitive — patch to 0.8.5 immediately. If you cannot patch today, disable the mooncake integration or firewall the ZeroMQ ports at the network perimeter. This is inference-server-level compromise: an attacker who reaches those sockets owns the host, the model weights, and everything in the same network segment.
Is CVE-2025-32444 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2025-32444, increasing the risk of exploitation.
How to fix CVE-2025-32444?
1) Patch: upgrade to vLLM >= 0.8.5 — this is the only full remediation. 2) Workaround (if patching is blocked): disable the mooncake integration entirely via configuration; confirm ZeroMQ sockets are not bound. 3) Network control: enforce firewall rules or Kubernetes NetworkPolicies restricting ZeroMQ port access to only trusted inter-node IPs; block all external access to ZeroMQ ports (default range varies — identify with `ss -tlnp | grep zmq` or check vLLM logs for bind addresses). 4) Detection: alert on unexpected outbound connections from inference hosts, unexpected processes spawned by the vLLM process, and anomalous network traffic on high-numbered TCP ports from GPU nodes. 5) Audit: inventory all vLLM deployments using `pip show vllm` and confirm mooncake usage in config files before assuming you are not affected.
What systems are affected by CVE-2025-32444?
This vulnerability affects the following AI/ML architecture patterns: Model serving / LLM inference, Distributed inference clusters, Multi-node GPU serving infrastructure, KV-cache offloading pipelines.
What is the CVSS score for CVE-2025-32444?
CVE-2025-32444 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 1.47%.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0006 Active Scanning AML.T0010.001 AI Software AML.T0025 Exfiltration via Cyber Means AML.T0049 Exploit Public-Facing Application AML.T0072 Reverse Shell Compliance Controls Affected
What are the technical details?
Original Advisory
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.6.5 and prior to 0.8.5, having vLLM integration with mooncake, are vulnerable to remote code execution due to using pickle based serialization over unsecured ZeroMQ sockets. The vulnerable sockets were set to listen on all network interfaces, increasing the likelihood that an attacker is able to reach the vulnerable ZeroMQ sockets to carry out an attack. vLLM instances that do not make use of the mooncake integration are not vulnerable. This issue has been patched in version 0.8.5.
Exploitation Scenario
An attacker performs reconnaissance on an organization's AI inference infrastructure — either via external scanning (Shodan/Censys for ZeroMQ fingerprints) or after gaining foothold in the same VPC. They connect directly to the ZeroMQ socket bound on 0.0.0.0 on the vLLM inference node. They craft and send a malicious pickle payload encoding a Python object whose `__reduce__` method executes an arbitrary OS command. Because ZeroMQ is unauthenticated and pickle deserializes without validation, the command executes immediately in the context of the vLLM process — typically running with GPU access and broad filesystem permissions. From there, the attacker exfiltrates model weights, dumps environment variables containing cloud credentials or API keys, and establishes a reverse shell for persistent access to the inference cluster.
Weaknesses (CWE)
CWE-502 — Deserialization of Untrusted Data: The product deserializes untrusted data without sufficiently ensuring that the resulting data will be valid.
- [Architecture and Design, Implementation] If available, use the signing/sealing features of the programming language to assure that deserialized data has not been tainted. For example, a hash-based message authentication code (HMAC) could be used to ensure that data has not been modified.
- [Implementation] When deserializing data, populate a new object rather than just deserializing. The result is that the data flows through safe input validation and that the functions are safe.
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H References
- github.com/advisories/GHSA-hj4w-hm2g-p6w5
- github.com/pypa/advisory-database/tree/main/vulns/vllm/PYSEC-2025-42.yaml
- nvd.nist.gov/vuln/detail/CVE-2025-32444
- github.com/vllm-project/vllm/blob/32b14baf8a1f7195ca09484de3008063569b43c5/vllm/distributed/kv_transfer/kv_pipe/mooncake_pipe.py Product
- github.com/vllm-project/vllm/commit/a5450f11c95847cf51a17207af9a3ca5ab569b2c Patch
- github.com/vllm-project/vllm/security/advisories/GHSA-hj4w-hm2g-p6w5 Exploit Vendor
- github.com/vllm-project/vllm/security/advisories/GHSA-x3m8-f7g5-qhm7 Not Applicable
- github.com/nomi-sec/PoC-in-GitHub Exploit
- github.com/stuxbench/vLLM-CVE-2025-32444 Exploit
- github.com/stuxbench/vllm-cve-2025-32444 Exploit
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2026-22807 9.8 vllm: Code Injection enables RCE
Same package: vllm