CVE-2025-47277: vLLM: RCE via exposed TCPStore in distributed inference

GHSA-hjq4-87xh-g4fv CRITICAL PoC AVAILABLE CISA: TRACK*
Published May 20, 2025
CISO Take

If your organization runs vLLM in distributed mode with PyNcclPipe KV cache transfer and V0 engine, you have an unauthenticated RCE vulnerability reachable from the network — patch to 0.8.5 immediately. The TCPStore socket was binding to all interfaces instead of the private KV network, meaning any host that can reach that port can send a malicious pickle payload and execute arbitrary code. This is a 9.8 CVSS fire drill for any team running distributed LLM inference at scale.

Risk Assessment

Severity is effectively maximum for affected configurations: no authentication, no user interaction, network-exploitable, and CWE-502 deserialization means RCE is the likely outcome. Scope is narrow — only PyNcclPipe + V0 engine users — but that covers high-value targets: orgs running distributed multi-GPU or multi-node vLLM inference, which are typically the largest and most sensitive deployments. EPSS at 0.865% reflects no observed exploitation yet, but the low technical barrier (find the port, send a pickle payload) means weaponization is fast. Not in KEV, but treat this as pre-KEV.

Affected Systems

Package Ecosystem Vulnerable Range Patched
vllm pip No patch
79.5K 127 dependents Pushed today 56% patched ~32d to patch Full package profile →
vllm pip >= 0.6.5, < 0.8.5 0.8.5
79.5K 127 dependents Pushed today 56% patched ~32d to patch Full package profile →

Severity & Risk

CVSS 3.1
9.8 / 10
EPSS
0.9%
chance of exploitation in 30 days
Higher than 75% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Moderate
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C High
I High
A High

Recommended Action

5 steps
  1. PATCH

    Upgrade vLLM to 0.8.5 immediately — this is the only complete fix.

  2. WORKAROUND (if patching is delayed): Use host-level firewall rules (iptables/security groups) to restrict access to the KV cache port (--kv-ip target) to only trusted inference nodes.

  3. VERIFY

    Confirm which nodes run PyNcclPipe + V0 engine by checking launch configs for --kv-transfer-config with PyNcclPipe and absence of --enable-v1.

  4. DETECT

    Scan for unexpected connections to the TCPStore port from non-inference-cluster IPs. Check for unusual process spawning from vLLM worker processes.

  5. AUDIT

    Review cloud security group rules — the vLLM docs warned about network isolation but the default was insecure.

CISA SSVC Assessment

Decision Track*
Exploitation none
Automatable Yes
Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2.6 - AI system security
NIST AI RMF
MANAGE 2.2 - Mechanisms are in place and applied to sustain the value of deployed AI systems
OWASP LLM Top 10
LLM03:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2025-47277?

If your organization runs vLLM in distributed mode with PyNcclPipe KV cache transfer and V0 engine, you have an unauthenticated RCE vulnerability reachable from the network — patch to 0.8.5 immediately. The TCPStore socket was binding to all interfaces instead of the private KV network, meaning any host that can reach that port can send a malicious pickle payload and execute arbitrary code. This is a 9.8 CVSS fire drill for any team running distributed LLM inference at scale.

Is CVE-2025-47277 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-47277, increasing the risk of exploitation.

How to fix CVE-2025-47277?

1. PATCH: Upgrade vLLM to 0.8.5 immediately — this is the only complete fix. 2. WORKAROUND (if patching is delayed): Use host-level firewall rules (iptables/security groups) to restrict access to the KV cache port (--kv-ip target) to only trusted inference nodes. 3. VERIFY: Confirm which nodes run PyNcclPipe + V0 engine by checking launch configs for --kv-transfer-config with PyNcclPipe and absence of --enable-v1. 4. DETECT: Scan for unexpected connections to the TCPStore port from non-inference-cluster IPs. Check for unusual process spawning from vLLM worker processes. 5. AUDIT: Review cloud security group rules — the vLLM docs warned about network isolation but the default was insecure.

What systems are affected by CVE-2025-47277?

This vulnerability affects the following AI/ML architecture patterns: distributed LLM inference, multi-node model serving, disaggregated prefill/decode inference, multi-GPU inference clusters.

What is the CVSS score for CVE-2025-47277?

CVE-2025-47277 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 0.86%.

Technical Details

NVD Description

vLLM, an inference and serving engine for large language models (LLMs), has an issue in versions 0.6.5 through 0.8.4 that ONLY impacts environments using the `PyNcclPipe` KV cache transfer integration with the V0 engine. No other configurations are affected. vLLM supports the use of the `PyNcclPipe` class to establish a peer-to-peer communication domain for data transmission between distributed nodes. The GPU-side KV-Cache transmission is implemented through the `PyNcclCommunicator` class, while CPU-side control message passing is handled via the `send_obj` and `recv_obj` methods on the CPU side.​ The intention was that this interface should only be exposed to a private network using the IP address specified by the `--kv-ip` CLI parameter. The vLLM documentation covers how this must be limited to a secured network. The default and intentional behavior from PyTorch is that the `TCPStore` interface listens on ALL interfaces, regardless of what IP address is provided. The IP address given was only used as a client-side address to use. vLLM was fixed to use a workaround to force the `TCPStore` instance to bind its socket to a specified private interface. As of version 0.8.5, vLLM limits the `TCPStore` socket to the private interface as configured.

Exploitation Scenario

Attacker scans for organizations running vLLM (job postings, GitHub repos, API fingerprinting). They identify a distributed inference cluster where the TCPStore port is reachable (misconfigured security group or internal network access via another compromise). Using PyTorch's distributed communication protocol, they connect to the exposed TCPStore and send a maliciously crafted serialized Python object via the recv_obj/send_obj interface. PyTorch deserializes the pickle payload, executing arbitrary code on the inference worker node. From there, the attacker can exfiltrate model weights, read KV cache contents containing other users' prompts, establish persistence, or pivot deeper into the ML infrastructure.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
May 20, 2025
Last Modified
August 13, 2025
First Seen
May 20, 2025

Related Vulnerabilities