CVE-2025-29783: vLLM: RCE via unsafe deserialization in Mooncake KV

GHSA-x3m8-f7g5-qhm7 CRITICAL PoC AVAILABLE CISA: TRACK*
Published March 19, 2025
CISO Take

Any vLLM deployment running Mooncake for distributed KV cache (v0.6.5–v0.7.x) is exposed to unauthenticated RCE from any adjacent-network host with zero user interaction. Patch to v0.8.0 immediately—this is trivial to exploit once network-adjacent and no special AI knowledge is required. If patching is blocked, disable Mooncake KV distribution and isolate ZMQ/TCP inference ports at the network layer until remediation is complete.

What is the risk?

Critical operational risk despite AV:A scope. Distributed LLM inference clusters typically share flat internal network segments with CI/CD systems, data pipelines, and developer workstations—making 'adjacent network' far easier to reach than perimeter controls suggest. Attack complexity is low, privileges required are minimal, and no user interaction is needed. A single compromised inference node provides full cluster access, model weights, cached inference data containing potentially sensitive prompts and responses, and lateral movement paths to adjacent infrastructure. Organizations running large-scale inference farms should treat this as P0.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
vLLM pip No patch
83.4K 130 dependents Pushed 3d ago 34% patched ~32d to patch Full package profile →
vLLM pip >= 0.6.5, < 0.8.0 0.8.0
83.4K 130 dependents Pushed 3d ago 34% patched ~32d to patch Full package profile →

How severe is it?

CVSS 3.1
9.0 / 10
EPSS
0.8%
chance of exploitation in 30 days
Higher than 52% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Adjacent
AC Low
PR Low
UI None
S Changed
C High
I High
A High

What should I do?

6 steps
  1. PATCH

    Upgrade vLLM to >= 0.8.0 immediately. This is the only complete fix per the advisory.

  2. WORKAROUND (if patching is blocked): Disable Mooncake KV distribution entirely; fall back to single-node inference or an alternative KV backend.

  3. NETWORK SEGMENTATION

    Apply strict firewall rules on ZMQ/TCP ports used by Mooncake. Only authenticated inference cluster nodes should reach these endpoints—block all other sources including developer and CI/CD networks.

  4. ISOLATION

    Place inference nodes in a dedicated network segment with no direct access from developer machines, containers, or build pipelines.

  5. DETECTION

    Monitor vLLM worker processes for anomalous outbound connections, unexpected child process spawns, or unauthorized file system access. Alert on any new listening sockets opened by inference processes.

  6. AUDIT

    Rotate any credentials, API keys, or tokens accessible from inference node environments as a precaution if exposure cannot be ruled out.

What does CISA's SSVC say?

Decision Track*
Exploitation none
Automatable Yes
Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity Article 9 - Risk management system
ISO 42001
A.7.4 - Risk treatment A.9.3 - AI system security within lifecycle
NIST AI RMF
GOVERN-1.7 - Processes for AI risk and impact assessment MANAGE-2.2 - Mechanisms are in place to sustain value of deployed AI systems
OWASP LLM Top 10
LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2025-29783?

Any vLLM deployment running Mooncake for distributed KV cache (v0.6.5–v0.7.x) is exposed to unauthenticated RCE from any adjacent-network host with zero user interaction. Patch to v0.8.0 immediately—this is trivial to exploit once network-adjacent and no special AI knowledge is required. If patching is blocked, disable Mooncake KV distribution and isolate ZMQ/TCP inference ports at the network layer until remediation is complete.

Is CVE-2025-29783 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-29783, increasing the risk of exploitation.

How to fix CVE-2025-29783?

1. PATCH: Upgrade vLLM to >= 0.8.0 immediately. This is the only complete fix per the advisory. 2. WORKAROUND (if patching is blocked): Disable Mooncake KV distribution entirely; fall back to single-node inference or an alternative KV backend. 3. NETWORK SEGMENTATION: Apply strict firewall rules on ZMQ/TCP ports used by Mooncake. Only authenticated inference cluster nodes should reach these endpoints—block all other sources including developer and CI/CD networks. 4. ISOLATION: Place inference nodes in a dedicated network segment with no direct access from developer machines, containers, or build pipelines. 5. DETECTION: Monitor vLLM worker processes for anomalous outbound connections, unexpected child process spawns, or unauthorized file system access. Alert on any new listening sockets opened by inference processes. 6. AUDIT: Rotate any credentials, API keys, or tokens accessible from inference node environments as a precaution if exposure cannot be ruled out.

What systems are affected by CVE-2025-29783?

This vulnerability affects the following AI/ML architecture patterns: distributed LLM inference, model serving, multi-node GPU inference clusters, LLM inference pipelines.

What is the CVSS score for CVE-2025-29783?

CVE-2025-29783 has a CVSS v3.1 base score of 9.0 (CRITICAL). The EPSS exploitation probability is 0.82%.

What is the AI security impact?

Affected AI Architectures

distributed LLM inferencemodel servingmulti-node GPU inference clustersLLM inference pipelines

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0025 Exfiltration via Cyber Means
AML.T0035 AI Artifact Collection
AML.T0049 Exploit Public-Facing Application
AML.T0072 Reverse Shell

Compliance Controls Affected

EU AI Act: Article 15, Article 9
ISO 42001: A.7.4, A.9.3
NIST AI RMF: GOVERN-1.7, MANAGE-2.2
OWASP LLM Top 10: LLM05

What are the technical details?

Original Advisory

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. When vLLM is configured to use Mooncake, unsafe deserialization exposed directly over ZMQ/TCP on all network interfaces will allow attackers to execute remote code on distributed hosts. This is a remote code execution vulnerability impacting any deployments using Mooncake to distribute KV across distributed hosts. This vulnerability is fixed in 0.8.0.

Exploitation Scenario

An attacker with access to the same internal network segment as a vLLM cluster—via a compromised developer laptop, a rogue container in the same Kubernetes namespace, or an insider—scans for open ZMQ/TCP ports on inference worker nodes. Using a crafted pickle payload or other malicious serialized object, the attacker sends it directly to the Mooncake ZMQ endpoint. The unsafe deserialization triggers arbitrary code execution under the vLLM worker process. The attacker then: establishes a reverse shell for persistent access, harvests model weights and KV cache contents (which may include thousands of prior user prompts), extracts API keys or cloud credentials from the process environment, pivots laterally to other cluster nodes, and optionally backdoors the inference pipeline to manipulate LLM outputs for downstream applications without detection.

Weaknesses (CWE)

CWE-502 — Deserialization of Untrusted Data: The product deserializes untrusted data without sufficiently ensuring that the resulting data will be valid.

  • [Architecture and Design, Implementation] If available, use the signing/sealing features of the programming language to assure that deserialized data has not been tainted. For example, a hash-based message authentication code (HMAC) could be used to ensure that data has not been modified.
  • [Implementation] When deserializing data, populate a new object rather than just deserializing. The result is that the data flows through safe input validation and that the functions are safe.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:A/AC:L/PR:L/UI:N/S:C/C:H/I:H/A:H

Timeline

Published
March 19, 2025
Last Modified
July 2, 2025
First Seen
March 19, 2025

Related Vulnerabilities