CVE-2025-30165: vLLM: pickle RCE in multi-node inference deployments

GHSA-9pcc-gvx5-r5wm HIGH
Published May 6, 2025
CISO Take

If your organization runs vLLM with tensor parallelism across multiple hosts using the V0 engine (versions 0.5.2–0.9.x), secondary inference nodes are exploitable via malicious pickle deserialization — and no patch is coming. Migrate to vLLM >= 0.10.0 with the V1 engine immediately, or isolate all vLLM cluster traffic on a strictly controlled network segment. Organizations on v0.8.0+ with default settings are not affected.

What is the risk?

CVSS 8.0/High with adjacent-network attack vector. Exploitability is moderate: an attacker needs either network adjacency (enabling ARP cache poisoning) or a foothold on the primary vLLM host. No authentication is required and no user interaction is needed. The vendor has issued a formal wontfix — organizations still on V0 multi-node deployments face permanent, unpatched exposure. Real-world prevalence is limited given V0 is non-default since v0.8.0 and distributed tensor parallelism is uncommon outside large-scale GPU deployments, keeping EPSS low at 1.3%.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
vLLM pip No patch
83.4K 130 dependents Pushed 2d ago 34% patched ~32d to patch Full package profile →
vLLM pip >= 0.5.2, < 0.10.0 0.10.0
83.4K 130 dependents Pushed 2d ago 34% patched ~32d to patch Full package profile →

How severe is it?

CVSS 3.1
8.0 / 10
EPSS
0.5%
chance of exploitation in 30 days
Higher than 38% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Moderate

What is the attack surface?

AV AC PR UI S C I A
AV Adjacent
AC Low
PR Low
UI None
S Unchanged
C High
I High
A High

What should I do?

1 step
  1. 1) IMMEDIATE: Audit vLLM deployments — identify any multi-node tensor parallelism setups running vLLM < 0.10.0 or explicitly using the V0 engine. 2) PATCH: Upgrade to vLLM >= 0.10.0 and confirm V1 engine is active (default since v0.8.0; verify in startup logs or via --disable-v1-engine absence). 3) NETWORK CONTROLS: If V0 must be retained, isolate vLLM cluster traffic on a dedicated VLAN with strict ACLs; enable Dynamic ARP Inspection (DAI) on managed switches to block ARP poisoning. 4) MONITOR: Alert on unexpected ZeroMQ traffic (TCP 5555-5557 range) from unauthorized sources within inference VLANs. 5) HARDEN: Enforce mutual TLS or HMAC message authentication between inference nodes as defense-in-depth for any distributed ML serving infrastructure.

What does CISA's SSVC say?

Decision Track
Exploitation none
Automatable No
Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.1 - Policies for information security in AI systems
NIST AI RMF
MANAGE 2.2 - Mechanisms to sustain the value of deployed AI systems
OWASP LLM Top 10
LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2025-30165?

If your organization runs vLLM with tensor parallelism across multiple hosts using the V0 engine (versions 0.5.2–0.9.x), secondary inference nodes are exploitable via malicious pickle deserialization — and no patch is coming. Migrate to vLLM >= 0.10.0 with the V1 engine immediately, or isolate all vLLM cluster traffic on a strictly controlled network segment. Organizations on v0.8.0+ with default settings are not affected.

Is CVE-2025-30165 actively exploited?

No confirmed active exploitation of CVE-2025-30165 has been reported, but organizations should still patch proactively.

How to fix CVE-2025-30165?

1) IMMEDIATE: Audit vLLM deployments — identify any multi-node tensor parallelism setups running vLLM < 0.10.0 or explicitly using the V0 engine. 2) PATCH: Upgrade to vLLM >= 0.10.0 and confirm V1 engine is active (default since v0.8.0; verify in startup logs or via --disable-v1-engine absence). 3) NETWORK CONTROLS: If V0 must be retained, isolate vLLM cluster traffic on a dedicated VLAN with strict ACLs; enable Dynamic ARP Inspection (DAI) on managed switches to block ARP poisoning. 4) MONITOR: Alert on unexpected ZeroMQ traffic (TCP 5555-5557 range) from unauthorized sources within inference VLANs. 5) HARDEN: Enforce mutual TLS or HMAC message authentication between inference nodes as defense-in-depth for any distributed ML serving infrastructure.

What systems are affected by CVE-2025-30165?

This vulnerability affects the following AI/ML architecture patterns: distributed LLM inference clusters, multi-node tensor parallel serving, LLM inference infrastructure, model serving pipelines.

What is the CVSS score for CVE-2025-30165?

CVE-2025-30165 has a CVSS v3.1 base score of 8.0 (HIGH). The EPSS exploitation probability is 0.48%.

What is the AI security impact?

Affected AI Architectures

distributed LLM inference clustersmulti-node tensor parallel servingLLM inference infrastructuremodel serving pipelines

MITRE ATLAS Techniques

AML.T0025 Exfiltration via Cyber Means
AML.T0044 Full AI Model Access
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: A.6.1
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM05

What are the technical details?

Original Advisory

vLLM is an inference and serving engine for large language models. In a multi-node vLLM deployment using the V0 engine, vLLM uses ZeroMQ for some multi-node communication purposes. The secondary vLLM hosts open a `SUB` ZeroMQ socket and connect to an `XPUB` socket on the primary vLLM host. When data is received on this `SUB` socket, it is deserialized with `pickle`. This is unsafe, as it can be abused to execute code on a remote machine. Since the vulnerability exists in a client that connects to the primary vLLM host, this vulnerability serves as an escalation point. If the primary vLLM host is compromised, this vulnerability could be used to compromise the rest of the hosts in the vLLM deployment. Attackers could also use other means to exploit the vulnerability without requiring access to the primary vLLM host. One example would be the use of ARP cache poisoning to redirect traffic to a malicious endpoint used to deliver a payload with arbitrary code to execute on the target machine. Note that this issue only affects the V0 engine, which has been off by default since v0.8.0. Further, the issue only applies to a deployment using tensor parallelism across multiple hosts, which we do not expect to be a common deployment pattern. Since V0 is has been off by default since v0.8.0 and the fix is fairly invasive, the maintainers of vLLM have decided not to fix this issue. Instead, the maintainers recommend that users ensure their environment is on a secure network in case this pattern is in use. The V1 engine is not affected by this issue.

Exploitation Scenario

An attacker positioned on the same network segment as a vLLM multi-node cluster performs ARP cache poisoning to impersonate the primary vLLM host's IP, hijacking the XPUB ZeroMQ socket. Secondary worker nodes — configured to subscribe to the primary's XPUB endpoint — begin receiving attacker-controlled ZeroMQ messages. These messages contain a crafted pickle payload encoding a Python object whose __reduce__ method executes arbitrary OS commands upon deserialization (e.g., reverse shell, credential dump, or ransomware). The attacker achieves simultaneous RCE on all subscribed secondary nodes, gaining control of GPU inference processes, model weights resident in memory, and any secrets accessible to the vLLM service account — all with no authentication and no user interaction required.

Weaknesses (CWE)

CWE-502 — Deserialization of Untrusted Data: The product deserializes untrusted data without sufficiently ensuring that the resulting data will be valid.

  • [Architecture and Design, Implementation] If available, use the signing/sealing features of the programming language to assure that deserialized data has not been tainted. For example, a hash-based message authentication code (HMAC) could be used to ensure that data has not been modified.
  • [Implementation] When deserializing data, populate a new object rather than just deserializing. The result is that the data flows through safe input validation and that the functions are safe.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:A/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
May 6, 2025
Last Modified
December 5, 2025
First Seen
May 6, 2025

Related Vulnerabilities