CVE-2025-32444: vLLM: RCE via pickle deserialization on ZeroMQ

GHSA-hj4w-hm2g-p6w5 CRITICAL PoC AVAILABLE CISA: TRACK*
Published April 30, 2025
CISO Take

Any vLLM deployment using the mooncake KV-transfer integration (versions 0.6.5–0.8.5) exposes an unauthenticated, network-accessible RCE primitive — patch to 0.8.5 immediately. If you cannot patch today, disable the mooncake integration or firewall the ZeroMQ ports at the network perimeter. This is inference-server-level compromise: an attacker who reaches those sockets owns the host, the model weights, and everything in the same network segment.

What is the risk?

Severity is effectively maximum for affected deployments. CVSS 9.8 with no authentication, no user interaction, and low attack complexity means any network-reachable vLLM+mooncake instance is trivially exploitable. The sockets listen on 0.0.0.0 (all interfaces), so cloud-hosted inference endpoints, Kubernetes clusters with improperly scoped network policies, and any externally reachable GPU node are all in scope. EPSS of 2.5% is relatively low today, but the vulnerability is straightforward enough that weaponized exploits are a matter of days from public disclosure. No evidence of KEV listing yet, but the exploit surface is unusually clean.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
vLLM pip No patch
83.4K 130 dependents Pushed 3d ago 34% patched ~32d to patch Full package profile →
vLLM pip >= 0.6.5, < 0.8.5 0.8.5
83.4K 130 dependents Pushed 3d ago 34% patched ~32d to patch Full package profile →

How severe is it?

CVSS 3.1
9.8 / 10
EPSS
1.5%
chance of exploitation in 30 days
Higher than 70% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C High
I High
A High

What should I do?

1 step
  1. 1) Patch: upgrade to vLLM >= 0.8.5 — this is the only full remediation. 2) Workaround (if patching is blocked): disable the mooncake integration entirely via configuration; confirm ZeroMQ sockets are not bound. 3) Network control: enforce firewall rules or Kubernetes NetworkPolicies restricting ZeroMQ port access to only trusted inter-node IPs; block all external access to ZeroMQ ports (default range varies — identify with ss -tlnp | grep zmq or check vLLM logs for bind addresses). 4) Detection: alert on unexpected outbound connections from inference hosts, unexpected processes spawned by the vLLM process, and anomalous network traffic on high-numbered TCP ports from GPU nodes. 5) Audit: inventory all vLLM deployments using pip show vllm and confirm mooncake usage in config files before assuming you are not affected.

What does CISA's SSVC say?

Decision Track*
Exploitation none
Automatable Yes
Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity Article 9 - Risk management system
ISO 42001
A.6.2 - AI risk management process A.9.2 - Information security for AI systems
NIST AI RMF
MANAGE 2.2 - Mechanisms are in place and applied to sustain the value of deployed AI systems MAP 5.1 - Likelihood and magnitude of each identified AI risk is assessed
OWASP LLM Top 10
LLM03 - Supply Chain

Frequently Asked Questions

What is CVE-2025-32444?

Any vLLM deployment using the mooncake KV-transfer integration (versions 0.6.5–0.8.5) exposes an unauthenticated, network-accessible RCE primitive — patch to 0.8.5 immediately. If you cannot patch today, disable the mooncake integration or firewall the ZeroMQ ports at the network perimeter. This is inference-server-level compromise: an attacker who reaches those sockets owns the host, the model weights, and everything in the same network segment.

Is CVE-2025-32444 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-32444, increasing the risk of exploitation.

How to fix CVE-2025-32444?

1) Patch: upgrade to vLLM >= 0.8.5 — this is the only full remediation. 2) Workaround (if patching is blocked): disable the mooncake integration entirely via configuration; confirm ZeroMQ sockets are not bound. 3) Network control: enforce firewall rules or Kubernetes NetworkPolicies restricting ZeroMQ port access to only trusted inter-node IPs; block all external access to ZeroMQ ports (default range varies — identify with `ss -tlnp | grep zmq` or check vLLM logs for bind addresses). 4) Detection: alert on unexpected outbound connections from inference hosts, unexpected processes spawned by the vLLM process, and anomalous network traffic on high-numbered TCP ports from GPU nodes. 5) Audit: inventory all vLLM deployments using `pip show vllm` and confirm mooncake usage in config files before assuming you are not affected.

What systems are affected by CVE-2025-32444?

This vulnerability affects the following AI/ML architecture patterns: Model serving / LLM inference, Distributed inference clusters, Multi-node GPU serving infrastructure, KV-cache offloading pipelines.

What is the CVSS score for CVE-2025-32444?

CVE-2025-32444 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 1.47%.

What is the AI security impact?

Affected AI Architectures

Model serving / LLM inferenceDistributed inference clustersMulti-node GPU serving infrastructureKV-cache offloading pipelines

MITRE ATLAS Techniques

AML.T0006 Active Scanning
AML.T0010.001 AI Software
AML.T0025 Exfiltration via Cyber Means
AML.T0049 Exploit Public-Facing Application
AML.T0072 Reverse Shell

Compliance Controls Affected

EU AI Act: Article 15, Article 9
ISO 42001: A.6.2, A.9.2
NIST AI RMF: MANAGE 2.2, MAP 5.1
OWASP LLM Top 10: LLM03

What are the technical details?

Original Advisory

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.6.5 and prior to 0.8.5, having vLLM integration with mooncake, are vulnerable to remote code execution due to using pickle based serialization over unsecured ZeroMQ sockets. The vulnerable sockets were set to listen on all network interfaces, increasing the likelihood that an attacker is able to reach the vulnerable ZeroMQ sockets to carry out an attack. vLLM instances that do not make use of the mooncake integration are not vulnerable. This issue has been patched in version 0.8.5.

Exploitation Scenario

An attacker performs reconnaissance on an organization's AI inference infrastructure — either via external scanning (Shodan/Censys for ZeroMQ fingerprints) or after gaining foothold in the same VPC. They connect directly to the ZeroMQ socket bound on 0.0.0.0 on the vLLM inference node. They craft and send a malicious pickle payload encoding a Python object whose `__reduce__` method executes an arbitrary OS command. Because ZeroMQ is unauthenticated and pickle deserializes without validation, the command executes immediately in the context of the vLLM process — typically running with GPU access and broad filesystem permissions. From there, the attacker exfiltrates model weights, dumps environment variables containing cloud credentials or API keys, and establishes a reverse shell for persistent access to the inference cluster.

Weaknesses (CWE)

CWE-502 — Deserialization of Untrusted Data: The product deserializes untrusted data without sufficiently ensuring that the resulting data will be valid.

  • [Architecture and Design, Implementation] If available, use the signing/sealing features of the programming language to assure that deserialized data has not been tainted. For example, a hash-based message authentication code (HMAC) could be used to ensure that data has not been modified.
  • [Implementation] When deserializing data, populate a new object rather than just deserializing. The result is that the data flows through safe input validation and that the functions are safe.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
April 30, 2025
Last Modified
May 29, 2025
First Seen
April 30, 2025

Related Vulnerabilities