CVE-2025-30165 — HIGH (CVSS 8.0) AI Security Vulnerability

CISO Take

If your organization runs vLLM with tensor parallelism across multiple hosts using the V0 engine (versions 0.5.2–0.9.x), secondary inference nodes are exploitable via malicious pickle deserialization — and no patch is coming. Migrate to vLLM >= 0.10.0 with the V1 engine immediately, or isolate all vLLM cluster traffic on a strictly controlled network segment. Organizations on v0.8.0+ with default settings are not affected.

Risk Assessment

CVSS 8.0/High with adjacent-network attack vector. Exploitability is moderate: an attacker needs either network adjacency (enabling ARP cache poisoning) or a foothold on the primary vLLM host. No authentication is required and no user interaction is needed. The vendor has issued a formal wontfix — organizations still on V0 multi-node deployments face permanent, unpatched exposure. Real-world prevalence is limited given V0 is non-default since v0.8.0 and distributed tensor parallelism is uncommon outside large-scale GPU deployments, keeping EPSS low at 1.3%.

Affected Systems

Package	Ecosystem	Vulnerable Range	Patched
vllm	pip	—	No patch
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →
vllm	pip	>= 0.5.2, < 0.10.0	`0.10.0`
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →

Severity & Risk

CVSS 3.1

8.0 / 10

EPSS

1.3%

chance of exploitation in 30 days

Higher than 80% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

No known exploitation

Sophistication

Moderate

Attack Surface

AV Adjacent

AC Low

PR Low

UI None

S Unchanged

C High

I High

A High

Recommended Action

1 step

1) IMMEDIATE: Audit vLLM deployments — identify any multi-node tensor parallelism setups running vLLM < 0.10.0 or explicitly using the V0 engine. 2) PATCH: Upgrade to vLLM >= 0.10.0 and confirm V1 engine is active (default since v0.8.0; verify in startup logs or via --disable-v1-engine absence). 3) NETWORK CONTROLS: If V0 must be retained, isolate vLLM cluster traffic on a dedicated VLAN with strict ACLs; enable Dynamic ARP Inspection (DAI) on managed switches to block ARP poisoning. 4) MONITOR: Alert on unexpected ZeroMQ traffic (TCP 5555-5557 range) from unauthorized sources within inference VLANs. 5) HARDEN: Enforce mutual TLS or HMAC message authentication between inference nodes as defense-in-depth for any distributed ML serving infrastructure.

CISA SSVC Assessment

Decision Track

Exploitation none

Automatable No

Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Code Execution Inference Framework AML.T0025 - Exfiltration via Cyber Means AML.T0044 - Full AI Model Access AML.T0049 - Exploit Public-Facing Application

Compliance Impact

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.6.1 - Policies for information security in AI systems

NIST AI RMF

MANAGE 2.2 - Mechanisms to sustain the value of deployed AI systems

OWASP LLM Top 10

LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2025-30165?

If your organization runs vLLM with tensor parallelism across multiple hosts using the V0 engine (versions 0.5.2–0.9.x), secondary inference nodes are exploitable via malicious pickle deserialization — and no patch is coming. Migrate to vLLM >= 0.10.0 with the V1 engine immediately, or isolate all vLLM cluster traffic on a strictly controlled network segment. Organizations on v0.8.0+ with default settings are not affected.

Is CVE-2025-30165 actively exploited?

No confirmed active exploitation of CVE-2025-30165 has been reported, but organizations should still patch proactively.

How to fix CVE-2025-30165?

1) IMMEDIATE: Audit vLLM deployments — identify any multi-node tensor parallelism setups running vLLM < 0.10.0 or explicitly using the V0 engine. 2) PATCH: Upgrade to vLLM >= 0.10.0 and confirm V1 engine is active (default since v0.8.0; verify in startup logs or via --disable-v1-engine absence). 3) NETWORK CONTROLS: If V0 must be retained, isolate vLLM cluster traffic on a dedicated VLAN with strict ACLs; enable Dynamic ARP Inspection (DAI) on managed switches to block ARP poisoning. 4) MONITOR: Alert on unexpected ZeroMQ traffic (TCP 5555-5557 range) from unauthorized sources within inference VLANs. 5) HARDEN: Enforce mutual TLS or HMAC message authentication between inference nodes as defense-in-depth for any distributed ML serving infrastructure.

What systems are affected by CVE-2025-30165?

This vulnerability affects the following AI/ML architecture patterns: distributed LLM inference clusters, multi-node tensor parallel serving, LLM inference infrastructure, model serving pipelines.

What is the CVSS score for CVE-2025-30165?

CVE-2025-30165 has a CVSS v3.1 base score of 8.0 (HIGH). The EPSS exploitation probability is 1.31%.

Technical Details

NVD Description

vLLM is an inference and serving engine for large language models. In a multi-node vLLM deployment using the V0 engine, vLLM uses ZeroMQ for some multi-node communication purposes. The secondary vLLM hosts open a `SUB` ZeroMQ socket and connect to an `XPUB` socket on the primary vLLM host. When data is received on this `SUB` socket, it is deserialized with `pickle`. This is unsafe, as it can be abused to execute code on a remote machine. Since the vulnerability exists in a client that connects to the primary vLLM host, this vulnerability serves as an escalation point. If the primary vLLM host is compromised, this vulnerability could be used to compromise the rest of the hosts in the vLLM deployment. Attackers could also use other means to exploit the vulnerability without requiring access to the primary vLLM host. One example would be the use of ARP cache poisoning to redirect traffic to a malicious endpoint used to deliver a payload with arbitrary code to execute on the target machine. Note that this issue only affects the V0 engine, which has been off by default since v0.8.0. Further, the issue only applies to a deployment using tensor parallelism across multiple hosts, which we do not expect to be a common deployment pattern. Since V0 is has been off by default since v0.8.0 and the fix is fairly invasive, the maintainers of vLLM have decided not to fix this issue. Instead, the maintainers recommend that users ensure their environment is on a secure network in case this pattern is in use. The V1 engine is not affected by this issue.

Exploitation Scenario

An attacker positioned on the same network segment as a vLLM multi-node cluster performs ARP cache poisoning to impersonate the primary vLLM host's IP, hijacking the XPUB ZeroMQ socket. Secondary worker nodes — configured to subscribe to the primary's XPUB endpoint — begin receiving attacker-controlled ZeroMQ messages. These messages contain a crafted pickle payload encoding a Python object whose __reduce__ method executes arbitrary OS commands upon deserialization (e.g., reverse shell, credential dump, or ransomware). The attacker achieves simultaneous RCE on all subscribed secondary nodes, gaining control of GPU inference processes, model weights resident in memory, and any secrets accessible to the vLLM service account — all with no authentication and no user interaction required.