CVE-2025-32444 — CRITICAL (CVSS 9.8) AI Security Vulnerability

CISO Take

Any vLLM deployment using the mooncake KV-transfer integration (versions 0.6.5–0.8.5) exposes an unauthenticated, network-accessible RCE primitive — patch to 0.8.5 immediately. If you cannot patch today, disable the mooncake integration or firewall the ZeroMQ ports at the network perimeter. This is inference-server-level compromise: an attacker who reaches those sockets owns the host, the model weights, and everything in the same network segment.

Risk Assessment

Severity is effectively maximum for affected deployments. CVSS 9.8 with no authentication, no user interaction, and low attack complexity means any network-reachable vLLM+mooncake instance is trivially exploitable. The sockets listen on 0.0.0.0 (all interfaces), so cloud-hosted inference endpoints, Kubernetes clusters with improperly scoped network policies, and any externally reachable GPU node are all in scope. EPSS of 2.5% is relatively low today, but the vulnerability is straightforward enough that weaponized exploits are a matter of days from public disclosure. No evidence of KEV listing yet, but the exploit surface is unusually clean.

Affected Systems

Package	Ecosystem	Vulnerable Range	Patched
vllm	pip	—	No patch
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →
vllm	pip	>= 0.6.5, < 0.8.5	`0.8.5`
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →

Severity & Risk

CVSS 3.1

9.8 / 10

EPSS

2.5%

chance of exploitation in 30 days

Higher than 85% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Trivial

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV Network

AC Low

PR None

UI None

S Unchanged

C High

I High

A High

Recommended Action

1 step

1) Patch: upgrade to vLLM >= 0.8.5 — this is the only full remediation. 2) Workaround (if patching is blocked): disable the mooncake integration entirely via configuration; confirm ZeroMQ sockets are not bound. 3) Network control: enforce firewall rules or Kubernetes NetworkPolicies restricting ZeroMQ port access to only trusted inter-node IPs; block all external access to ZeroMQ ports (default range varies — identify with ss -tlnp | grep zmq or check vLLM logs for bind addresses). 4) Detection: alert on unexpected outbound connections from inference hosts, unexpected processes spawned by the vLLM process, and anomalous network traffic on high-numbered TCP ports from GPU nodes. 5) Audit: inventory all vLLM deployments using pip show vllm and confirm mooncake usage in config files before assuming you are not affected.

CISA SSVC Assessment

Decision Track*

Exploitation none

Automatable Yes

Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Code Execution Supply Chain Inference Framework AML.T0006 - Active Scanning AML.T0010.001 - AI Software AML.T0025 - Exfiltration via Cyber Means AML.T0049 - Exploit Public-Facing Application AML.T0072 - Reverse Shell

Compliance Impact

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity Article 9 - Risk management system

ISO 42001

A.6.2 - AI risk management process A.9.2 - Information security for AI systems

NIST AI RMF

MANAGE 2.2 - Mechanisms are in place and applied to sustain the value of deployed AI systems MAP 5.1 - Likelihood and magnitude of each identified AI risk is assessed

OWASP LLM Top 10

LLM03 - Supply Chain

Frequently Asked Questions

What is CVE-2025-32444?

Any vLLM deployment using the mooncake KV-transfer integration (versions 0.6.5–0.8.5) exposes an unauthenticated, network-accessible RCE primitive — patch to 0.8.5 immediately. If you cannot patch today, disable the mooncake integration or firewall the ZeroMQ ports at the network perimeter. This is inference-server-level compromise: an attacker who reaches those sockets owns the host, the model weights, and everything in the same network segment.

Is CVE-2025-32444 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-32444, increasing the risk of exploitation.

How to fix CVE-2025-32444?

1) Patch: upgrade to vLLM >= 0.8.5 — this is the only full remediation. 2) Workaround (if patching is blocked): disable the mooncake integration entirely via configuration; confirm ZeroMQ sockets are not bound. 3) Network control: enforce firewall rules or Kubernetes NetworkPolicies restricting ZeroMQ port access to only trusted inter-node IPs; block all external access to ZeroMQ ports (default range varies — identify with `ss -tlnp | grep zmq` or check vLLM logs for bind addresses). 4) Detection: alert on unexpected outbound connections from inference hosts, unexpected processes spawned by the vLLM process, and anomalous network traffic on high-numbered TCP ports from GPU nodes. 5) Audit: inventory all vLLM deployments using `pip show vllm` and confirm mooncake usage in config files before assuming you are not affected.

What systems are affected by CVE-2025-32444?

This vulnerability affects the following AI/ML architecture patterns: Model serving / LLM inference, Distributed inference clusters, Multi-node GPU serving infrastructure, KV-cache offloading pipelines.

What is the CVSS score for CVE-2025-32444?

CVE-2025-32444 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 2.48%.

Technical Details

NVD Description

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.6.5 and prior to 0.8.5, having vLLM integration with mooncake, are vulnerable to remote code execution due to using pickle based serialization over unsecured ZeroMQ sockets. The vulnerable sockets were set to listen on all network interfaces, increasing the likelihood that an attacker is able to reach the vulnerable ZeroMQ sockets to carry out an attack. vLLM instances that do not make use of the mooncake integration are not vulnerable. This issue has been patched in version 0.8.5.

Exploitation Scenario

An attacker performs reconnaissance on an organization's AI inference infrastructure — either via external scanning (Shodan/Censys for ZeroMQ fingerprints) or after gaining foothold in the same VPC. They connect directly to the ZeroMQ socket bound on 0.0.0.0 on the vLLM inference node. They craft and send a malicious pickle payload encoding a Python object whose `__reduce__` method executes an arbitrary OS command. Because ZeroMQ is unauthenticated and pickle deserializes without validation, the command executes immediately in the context of the vLLM process — typically running with GPU access and broad filesystem permissions. From there, the attacker exfiltrates model weights, dumps environment variables containing cloud credentials or API keys, and establishes a reverse shell for persistent access to the inference cluster.