CVE-2025-30165: vLLM: pickle RCE in multi-node inference deployments

GHSA-9pcc-gvx5-r5wm HIGH
Published May 6, 2025
CISO Take

If your organization runs vLLM with tensor parallelism across multiple hosts using the V0 engine (versions 0.5.2–0.9.x), secondary inference nodes are exploitable via malicious pickle deserialization — and no patch is coming. Migrate to vLLM >= 0.10.0 with the V1 engine immediately, or isolate all vLLM cluster traffic on a strictly controlled network segment. Organizations on v0.8.0+ with default settings are not affected.

Risk Assessment

CVSS 8.0/High with adjacent-network attack vector. Exploitability is moderate: an attacker needs either network adjacency (enabling ARP cache poisoning) or a foothold on the primary vLLM host. No authentication is required and no user interaction is needed. The vendor has issued a formal wontfix — organizations still on V0 multi-node deployments face permanent, unpatched exposure. Real-world prevalence is limited given V0 is non-default since v0.8.0 and distributed tensor parallelism is uncommon outside large-scale GPU deployments, keeping EPSS low at 1.3%.

Affected Systems

Package Ecosystem Vulnerable Range Patched
vllm pip No patch
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →
vllm pip >= 0.5.2, < 0.10.0 0.10.0
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →

Severity & Risk

CVSS 3.1
8.0 / 10
EPSS
1.3%
chance of exploitation in 30 days
Higher than 80% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Moderate

Attack Surface

AV AC PR UI S C I A
AV Adjacent
AC Low
PR Low
UI None
S Unchanged
C High
I High
A High

Recommended Action

1 step
  1. 1) IMMEDIATE: Audit vLLM deployments — identify any multi-node tensor parallelism setups running vLLM < 0.10.0 or explicitly using the V0 engine. 2) PATCH: Upgrade to vLLM >= 0.10.0 and confirm V1 engine is active (default since v0.8.0; verify in startup logs or via --disable-v1-engine absence). 3) NETWORK CONTROLS: If V0 must be retained, isolate vLLM cluster traffic on a dedicated VLAN with strict ACLs; enable Dynamic ARP Inspection (DAI) on managed switches to block ARP poisoning. 4) MONITOR: Alert on unexpected ZeroMQ traffic (TCP 5555-5557 range) from unauthorized sources within inference VLANs. 5) HARDEN: Enforce mutual TLS or HMAC message authentication between inference nodes as defense-in-depth for any distributed ML serving infrastructure.

CISA SSVC Assessment

Decision Track
Exploitation none
Automatable No
Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.1 - Policies for information security in AI systems
NIST AI RMF
MANAGE 2.2 - Mechanisms to sustain the value of deployed AI systems
OWASP LLM Top 10
LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2025-30165?

If your organization runs vLLM with tensor parallelism across multiple hosts using the V0 engine (versions 0.5.2–0.9.x), secondary inference nodes are exploitable via malicious pickle deserialization — and no patch is coming. Migrate to vLLM >= 0.10.0 with the V1 engine immediately, or isolate all vLLM cluster traffic on a strictly controlled network segment. Organizations on v0.8.0+ with default settings are not affected.

Is CVE-2025-30165 actively exploited?

No confirmed active exploitation of CVE-2025-30165 has been reported, but organizations should still patch proactively.

How to fix CVE-2025-30165?

1) IMMEDIATE: Audit vLLM deployments — identify any multi-node tensor parallelism setups running vLLM < 0.10.0 or explicitly using the V0 engine. 2) PATCH: Upgrade to vLLM >= 0.10.0 and confirm V1 engine is active (default since v0.8.0; verify in startup logs or via --disable-v1-engine absence). 3) NETWORK CONTROLS: If V0 must be retained, isolate vLLM cluster traffic on a dedicated VLAN with strict ACLs; enable Dynamic ARP Inspection (DAI) on managed switches to block ARP poisoning. 4) MONITOR: Alert on unexpected ZeroMQ traffic (TCP 5555-5557 range) from unauthorized sources within inference VLANs. 5) HARDEN: Enforce mutual TLS or HMAC message authentication between inference nodes as defense-in-depth for any distributed ML serving infrastructure.

What systems are affected by CVE-2025-30165?

This vulnerability affects the following AI/ML architecture patterns: distributed LLM inference clusters, multi-node tensor parallel serving, LLM inference infrastructure, model serving pipelines.

What is the CVSS score for CVE-2025-30165?

CVE-2025-30165 has a CVSS v3.1 base score of 8.0 (HIGH). The EPSS exploitation probability is 1.31%.

Technical Details

NVD Description

vLLM is an inference and serving engine for large language models. In a multi-node vLLM deployment using the V0 engine, vLLM uses ZeroMQ for some multi-node communication purposes. The secondary vLLM hosts open a `SUB` ZeroMQ socket and connect to an `XPUB` socket on the primary vLLM host. When data is received on this `SUB` socket, it is deserialized with `pickle`. This is unsafe, as it can be abused to execute code on a remote machine. Since the vulnerability exists in a client that connects to the primary vLLM host, this vulnerability serves as an escalation point. If the primary vLLM host is compromised, this vulnerability could be used to compromise the rest of the hosts in the vLLM deployment. Attackers could also use other means to exploit the vulnerability without requiring access to the primary vLLM host. One example would be the use of ARP cache poisoning to redirect traffic to a malicious endpoint used to deliver a payload with arbitrary code to execute on the target machine. Note that this issue only affects the V0 engine, which has been off by default since v0.8.0. Further, the issue only applies to a deployment using tensor parallelism across multiple hosts, which we do not expect to be a common deployment pattern. Since V0 is has been off by default since v0.8.0 and the fix is fairly invasive, the maintainers of vLLM have decided not to fix this issue. Instead, the maintainers recommend that users ensure their environment is on a secure network in case this pattern is in use. The V1 engine is not affected by this issue.

Exploitation Scenario

An attacker positioned on the same network segment as a vLLM multi-node cluster performs ARP cache poisoning to impersonate the primary vLLM host's IP, hijacking the XPUB ZeroMQ socket. Secondary worker nodes — configured to subscribe to the primary's XPUB endpoint — begin receiving attacker-controlled ZeroMQ messages. These messages contain a crafted pickle payload encoding a Python object whose __reduce__ method executes arbitrary OS commands upon deserialization (e.g., reverse shell, credential dump, or ransomware). The attacker achieves simultaneous RCE on all subscribed secondary nodes, gaining control of GPU inference processes, model weights resident in memory, and any secrets accessible to the vLLM service account — all with no authentication and no user interaction required.

CVSS Vector

CVSS:3.1/AV:A/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
May 6, 2025
Last Modified
December 5, 2025
First Seen
May 6, 2025

Related Vulnerabilities