If your organization runs vLLM with tensor parallelism across multiple hosts using the V0 engine (versions 0.5.2–0.9.x), secondary inference nodes are exploitable via malicious pickle deserialization — and no patch is coming. Migrate to vLLM >= 0.10.0 with the V1 engine immediately, or isolate all vLLM cluster traffic on a strictly controlled network segment. Organizations on v0.8.0+ with default settings are not affected.
Risk Assessment
CVSS 8.0/High with adjacent-network attack vector. Exploitability is moderate: an attacker needs either network adjacency (enabling ARP cache poisoning) or a foothold on the primary vLLM host. No authentication is required and no user interaction is needed. The vendor has issued a formal wontfix — organizations still on V0 multi-node deployments face permanent, unpatched exposure. Real-world prevalence is limited given V0 is non-default since v0.8.0 and distributed tensor parallelism is uncommon outside large-scale GPU deployments, keeping EPSS low at 1.3%.
Affected Systems
Severity & Risk
Attack Surface
Recommended Action
1 step-
1) IMMEDIATE: Audit vLLM deployments — identify any multi-node tensor parallelism setups running vLLM < 0.10.0 or explicitly using the V0 engine. 2) PATCH: Upgrade to vLLM >= 0.10.0 and confirm V1 engine is active (default since v0.8.0; verify in startup logs or via --disable-v1-engine absence). 3) NETWORK CONTROLS: If V0 must be retained, isolate vLLM cluster traffic on a dedicated VLAN with strict ACLs; enable Dynamic ARP Inspection (DAI) on managed switches to block ARP poisoning. 4) MONITOR: Alert on unexpected ZeroMQ traffic (TCP 5555-5557 range) from unauthorized sources within inference VLANs. 5) HARDEN: Enforce mutual TLS or HMAC message authentication between inference nodes as defense-in-depth for any distributed ML serving infrastructure.
CISA SSVC Assessment
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2025-30165?
If your organization runs vLLM with tensor parallelism across multiple hosts using the V0 engine (versions 0.5.2–0.9.x), secondary inference nodes are exploitable via malicious pickle deserialization — and no patch is coming. Migrate to vLLM >= 0.10.0 with the V1 engine immediately, or isolate all vLLM cluster traffic on a strictly controlled network segment. Organizations on v0.8.0+ with default settings are not affected.
Is CVE-2025-30165 actively exploited?
No confirmed active exploitation of CVE-2025-30165 has been reported, but organizations should still patch proactively.
How to fix CVE-2025-30165?
1) IMMEDIATE: Audit vLLM deployments — identify any multi-node tensor parallelism setups running vLLM < 0.10.0 or explicitly using the V0 engine. 2) PATCH: Upgrade to vLLM >= 0.10.0 and confirm V1 engine is active (default since v0.8.0; verify in startup logs or via --disable-v1-engine absence). 3) NETWORK CONTROLS: If V0 must be retained, isolate vLLM cluster traffic on a dedicated VLAN with strict ACLs; enable Dynamic ARP Inspection (DAI) on managed switches to block ARP poisoning. 4) MONITOR: Alert on unexpected ZeroMQ traffic (TCP 5555-5557 range) from unauthorized sources within inference VLANs. 5) HARDEN: Enforce mutual TLS or HMAC message authentication between inference nodes as defense-in-depth for any distributed ML serving infrastructure.
What systems are affected by CVE-2025-30165?
This vulnerability affects the following AI/ML architecture patterns: distributed LLM inference clusters, multi-node tensor parallel serving, LLM inference infrastructure, model serving pipelines.
What is the CVSS score for CVE-2025-30165?
CVE-2025-30165 has a CVSS v3.1 base score of 8.0 (HIGH). The EPSS exploitation probability is 1.31%.
Technical Details
NVD Description
vLLM is an inference and serving engine for large language models. In a multi-node vLLM deployment using the V0 engine, vLLM uses ZeroMQ for some multi-node communication purposes. The secondary vLLM hosts open a `SUB` ZeroMQ socket and connect to an `XPUB` socket on the primary vLLM host. When data is received on this `SUB` socket, it is deserialized with `pickle`. This is unsafe, as it can be abused to execute code on a remote machine. Since the vulnerability exists in a client that connects to the primary vLLM host, this vulnerability serves as an escalation point. If the primary vLLM host is compromised, this vulnerability could be used to compromise the rest of the hosts in the vLLM deployment. Attackers could also use other means to exploit the vulnerability without requiring access to the primary vLLM host. One example would be the use of ARP cache poisoning to redirect traffic to a malicious endpoint used to deliver a payload with arbitrary code to execute on the target machine. Note that this issue only affects the V0 engine, which has been off by default since v0.8.0. Further, the issue only applies to a deployment using tensor parallelism across multiple hosts, which we do not expect to be a common deployment pattern. Since V0 is has been off by default since v0.8.0 and the fix is fairly invasive, the maintainers of vLLM have decided not to fix this issue. Instead, the maintainers recommend that users ensure their environment is on a secure network in case this pattern is in use. The V1 engine is not affected by this issue.
Exploitation Scenario
An attacker positioned on the same network segment as a vLLM multi-node cluster performs ARP cache poisoning to impersonate the primary vLLM host's IP, hijacking the XPUB ZeroMQ socket. Secondary worker nodes — configured to subscribe to the primary's XPUB endpoint — begin receiving attacker-controlled ZeroMQ messages. These messages contain a crafted pickle payload encoding a Python object whose __reduce__ method executes arbitrary OS commands upon deserialization (e.g., reverse shell, credential dump, or ransomware). The attacker achieves simultaneous RCE on all subscribed secondary nodes, gaining control of GPU inference processes, model weights resident in memory, and any secrets accessible to the vLLM service account — all with no authentication and no user interaction required.
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:A/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H References
- github.com/advisories/GHSA-9pcc-gvx5-r5wm
- nvd.nist.gov/vuln/detail/CVE-2025-30165
- github.com/vllm-project/vllm/blob/c21b99b91241409c2fdf9f3f8c542e8748b317be/vllm/distributed/device_communicators/shm_broadcast.py Product
- github.com/vllm-project/vllm/blob/c21b99b91241409c2fdf9f3f8c542e8748b317be/vllm/distributed/device_communicators/shm_broadcast.py Product
- github.com/vllm-project/vllm/security/advisories/GHSA-9pcc-gvx5-r5wm Vendor
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm
AI Threat Alert