CVE-2025-29783: vLLM: RCE via unsafe deserialization in Mooncake KV

GHSA-x3m8-f7g5-qhm7 CRITICAL PoC AVAILABLE CISA: TRACK*
Published March 19, 2025
CISO Take

Any vLLM deployment running Mooncake for distributed KV cache (v0.6.5–v0.7.x) is exposed to unauthenticated RCE from any adjacent-network host with zero user interaction. Patch to v0.8.0 immediately—this is trivial to exploit once network-adjacent and no special AI knowledge is required. If patching is blocked, disable Mooncake KV distribution and isolate ZMQ/TCP inference ports at the network layer until remediation is complete.

Risk Assessment

Critical operational risk despite AV:A scope. Distributed LLM inference clusters typically share flat internal network segments with CI/CD systems, data pipelines, and developer workstations—making 'adjacent network' far easier to reach than perimeter controls suggest. Attack complexity is low, privileges required are minimal, and no user interaction is needed. A single compromised inference node provides full cluster access, model weights, cached inference data containing potentially sensitive prompts and responses, and lateral movement paths to adjacent infrastructure. Organizations running large-scale inference farms should treat this as P0.

Affected Systems

Package Ecosystem Vulnerable Range Patched
vllm pip No patch
79.5K 127 dependents Pushed today 56% patched ~32d to patch Full package profile →
vllm pip >= 0.6.5, < 0.8.0 0.8.0
79.5K 127 dependents Pushed today 56% patched ~32d to patch Full package profile →

Severity & Risk

CVSS 3.1
9.0 / 10
EPSS
2.8%
chance of exploitation in 30 days
Higher than 86% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Adjacent
AC Low
PR Low
UI None
S Changed
C High
I High
A High

Recommended Action

6 steps
  1. PATCH

    Upgrade vLLM to >= 0.8.0 immediately. This is the only complete fix per the advisory.

  2. WORKAROUND (if patching is blocked): Disable Mooncake KV distribution entirely; fall back to single-node inference or an alternative KV backend.

  3. NETWORK SEGMENTATION

    Apply strict firewall rules on ZMQ/TCP ports used by Mooncake. Only authenticated inference cluster nodes should reach these endpoints—block all other sources including developer and CI/CD networks.

  4. ISOLATION

    Place inference nodes in a dedicated network segment with no direct access from developer machines, containers, or build pipelines.

  5. DETECTION

    Monitor vLLM worker processes for anomalous outbound connections, unexpected child process spawns, or unauthorized file system access. Alert on any new listening sockets opened by inference processes.

  6. AUDIT

    Rotate any credentials, API keys, or tokens accessible from inference node environments as a precaution if exposure cannot be ruled out.

CISA SSVC Assessment

Decision Track*
Exploitation none
Automatable Yes
Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity Article 9 - Risk management system
ISO 42001
A.7.4 - Risk treatment A.9.3 - AI system security within lifecycle
NIST AI RMF
GOVERN-1.7 - Processes for AI risk and impact assessment MANAGE-2.2 - Mechanisms are in place to sustain value of deployed AI systems
OWASP LLM Top 10
LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2025-29783?

Any vLLM deployment running Mooncake for distributed KV cache (v0.6.5–v0.7.x) is exposed to unauthenticated RCE from any adjacent-network host with zero user interaction. Patch to v0.8.0 immediately—this is trivial to exploit once network-adjacent and no special AI knowledge is required. If patching is blocked, disable Mooncake KV distribution and isolate ZMQ/TCP inference ports at the network layer until remediation is complete.

Is CVE-2025-29783 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-29783, increasing the risk of exploitation.

How to fix CVE-2025-29783?

1. PATCH: Upgrade vLLM to >= 0.8.0 immediately. This is the only complete fix per the advisory. 2. WORKAROUND (if patching is blocked): Disable Mooncake KV distribution entirely; fall back to single-node inference or an alternative KV backend. 3. NETWORK SEGMENTATION: Apply strict firewall rules on ZMQ/TCP ports used by Mooncake. Only authenticated inference cluster nodes should reach these endpoints—block all other sources including developer and CI/CD networks. 4. ISOLATION: Place inference nodes in a dedicated network segment with no direct access from developer machines, containers, or build pipelines. 5. DETECTION: Monitor vLLM worker processes for anomalous outbound connections, unexpected child process spawns, or unauthorized file system access. Alert on any new listening sockets opened by inference processes. 6. AUDIT: Rotate any credentials, API keys, or tokens accessible from inference node environments as a precaution if exposure cannot be ruled out.

What systems are affected by CVE-2025-29783?

This vulnerability affects the following AI/ML architecture patterns: distributed LLM inference, model serving, multi-node GPU inference clusters, LLM inference pipelines.

What is the CVSS score for CVE-2025-29783?

CVE-2025-29783 has a CVSS v3.1 base score of 9.0 (CRITICAL). The EPSS exploitation probability is 2.81%.

Technical Details

NVD Description

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. When vLLM is configured to use Mooncake, unsafe deserialization exposed directly over ZMQ/TCP on all network interfaces will allow attackers to execute remote code on distributed hosts. This is a remote code execution vulnerability impacting any deployments using Mooncake to distribute KV across distributed hosts. This vulnerability is fixed in 0.8.0.

Exploitation Scenario

An attacker with access to the same internal network segment as a vLLM cluster—via a compromised developer laptop, a rogue container in the same Kubernetes namespace, or an insider—scans for open ZMQ/TCP ports on inference worker nodes. Using a crafted pickle payload or other malicious serialized object, the attacker sends it directly to the Mooncake ZMQ endpoint. The unsafe deserialization triggers arbitrary code execution under the vLLM worker process. The attacker then: establishes a reverse shell for persistent access, harvests model weights and KV cache contents (which may include thousands of prior user prompts), extracts API keys or cloud credentials from the process environment, pivots laterally to other cluster nodes, and optionally backdoors the inference pipeline to manipulate LLM outputs for downstream applications without detection.

CVSS Vector

CVSS:3.1/AV:A/AC:L/PR:L/UI:N/S:C/C:H/I:H/A:H

Timeline

Published
March 19, 2025
Last Modified
July 2, 2025
First Seen
March 19, 2025

Related Vulnerabilities