CVE-2025-29783 — CRITICAL (CVSS 9.0) AI Security Vulnerability

CISO Take

Any vLLM deployment running Mooncake for distributed KV cache (v0.6.5–v0.7.x) is exposed to unauthenticated RCE from any adjacent-network host with zero user interaction. Patch to v0.8.0 immediately—this is trivial to exploit once network-adjacent and no special AI knowledge is required. If patching is blocked, disable Mooncake KV distribution and isolate ZMQ/TCP inference ports at the network layer until remediation is complete.

Risk Assessment

Critical operational risk despite AV:A scope. Distributed LLM inference clusters typically share flat internal network segments with CI/CD systems, data pipelines, and developer workstations—making 'adjacent network' far easier to reach than perimeter controls suggest. Attack complexity is low, privileges required are minimal, and no user interaction is needed. A single compromised inference node provides full cluster access, model weights, cached inference data containing potentially sensitive prompts and responses, and lateral movement paths to adjacent infrastructure. Organizations running large-scale inference farms should treat this as P0.

Affected Systems

Package	Ecosystem	Vulnerable Range	Patched
vllm	pip	—	No patch
79.5K 127 dependents Pushed today 56% patched ~32d to patch Full package profile →
vllm	pip	>= 0.6.5, < 0.8.0	`0.8.0`
79.5K 127 dependents Pushed today 56% patched ~32d to patch Full package profile →

Severity & Risk

CVSS 3.1

9.0 / 10

EPSS

2.8%

chance of exploitation in 30 days

Higher than 86% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Trivial

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV Adjacent

AC Low

PR Low

UI None

S Changed

C High

I High

A High

Recommended Action

6 steps

PATCH

Upgrade vLLM to >= 0.8.0 immediately. This is the only complete fix per the advisory.
WORKAROUND (if patching is blocked): Disable Mooncake KV distribution entirely; fall back to single-node inference or an alternative KV backend.
NETWORK SEGMENTATION

Apply strict firewall rules on ZMQ/TCP ports used by Mooncake. Only authenticated inference cluster nodes should reach these endpoints—block all other sources including developer and CI/CD networks.
ISOLATION

Place inference nodes in a dedicated network segment with no direct access from developer machines, containers, or build pipelines.
DETECTION

Monitor vLLM worker processes for anomalous outbound connections, unexpected child process spawns, or unauthorized file system access. Alert on any new listening sockets opened by inference processes.
AUDIT

Rotate any credentials, API keys, or tokens accessible from inference node environments as a precaution if exposure cannot be ruled out.

CISA SSVC Assessment

Decision Track*

Exploitation none

Automatable Yes

Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Code Execution Supply Chain Framework Inference AML.T0010.001 - AI Software AML.T0025 - Exfiltration via Cyber Means AML.T0035 - AI Artifact Collection AML.T0049 - Exploit Public-Facing Application AML.T0072 - Reverse Shell

Compliance Impact

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity Article 9 - Risk management system

ISO 42001

A.7.4 - Risk treatment A.9.3 - AI system security within lifecycle

NIST AI RMF

GOVERN-1.7 - Processes for AI risk and impact assessment MANAGE-2.2 - Mechanisms are in place to sustain value of deployed AI systems

OWASP LLM Top 10

LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2025-29783?

Any vLLM deployment running Mooncake for distributed KV cache (v0.6.5–v0.7.x) is exposed to unauthenticated RCE from any adjacent-network host with zero user interaction. Patch to v0.8.0 immediately—this is trivial to exploit once network-adjacent and no special AI knowledge is required. If patching is blocked, disable Mooncake KV distribution and isolate ZMQ/TCP inference ports at the network layer until remediation is complete.

Is CVE-2025-29783 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-29783, increasing the risk of exploitation.

How to fix CVE-2025-29783?

1. PATCH: Upgrade vLLM to >= 0.8.0 immediately. This is the only complete fix per the advisory. 2. WORKAROUND (if patching is blocked): Disable Mooncake KV distribution entirely; fall back to single-node inference or an alternative KV backend. 3. NETWORK SEGMENTATION: Apply strict firewall rules on ZMQ/TCP ports used by Mooncake. Only authenticated inference cluster nodes should reach these endpoints—block all other sources including developer and CI/CD networks. 4. ISOLATION: Place inference nodes in a dedicated network segment with no direct access from developer machines, containers, or build pipelines. 5. DETECTION: Monitor vLLM worker processes for anomalous outbound connections, unexpected child process spawns, or unauthorized file system access. Alert on any new listening sockets opened by inference processes. 6. AUDIT: Rotate any credentials, API keys, or tokens accessible from inference node environments as a precaution if exposure cannot be ruled out.

What systems are affected by CVE-2025-29783?

This vulnerability affects the following AI/ML architecture patterns: distributed LLM inference, model serving, multi-node GPU inference clusters, LLM inference pipelines.

What is the CVSS score for CVE-2025-29783?

CVE-2025-29783 has a CVSS v3.1 base score of 9.0 (CRITICAL). The EPSS exploitation probability is 2.81%.

Technical Details

NVD Description

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. When vLLM is configured to use Mooncake, unsafe deserialization exposed directly over ZMQ/TCP on all network interfaces will allow attackers to execute remote code on distributed hosts. This is a remote code execution vulnerability impacting any deployments using Mooncake to distribute KV across distributed hosts. This vulnerability is fixed in 0.8.0.

Exploitation Scenario

An attacker with access to the same internal network segment as a vLLM cluster—via a compromised developer laptop, a rogue container in the same Kubernetes namespace, or an insider—scans for open ZMQ/TCP ports on inference worker nodes. Using a crafted pickle payload or other malicious serialized object, the attacker sends it directly to the Mooncake ZMQ endpoint. The unsafe deserialization triggers arbitrary code execution under the vLLM worker process. The attacker then: establishes a reverse shell for persistent access, harvests model weights and KV cache contents (which may include thousands of prior user prompts), extracts API keys or cloud credentials from the process environment, pivots laterally to other cluster nodes, and optionally backdoors the inference pipeline to manipulate LLM outputs for downstream applications without detection.