CVE-2025-47277: vLLM: RCE via exposed TCPStore in distributed inference

GHSA-hjq4-87xh-g4fv CRITICAL PoC AVAILABLE CISA: TRACK*
Published May 20, 2025
CISO Take

If your organization runs vLLM in distributed mode with PyNcclPipe KV cache transfer and V0 engine, you have an unauthenticated RCE vulnerability reachable from the network — patch to 0.8.5 immediately. The TCPStore socket was binding to all interfaces instead of the private KV network, meaning any host that can reach that port can send a malicious pickle payload and execute arbitrary code. This is a 9.8 CVSS fire drill for any team running distributed LLM inference at scale.

What is the risk?

Severity is effectively maximum for affected configurations: no authentication, no user interaction, network-exploitable, and CWE-502 deserialization means RCE is the likely outcome. Scope is narrow — only PyNcclPipe + V0 engine users — but that covers high-value targets: orgs running distributed multi-GPU or multi-node vLLM inference, which are typically the largest and most sensitive deployments. EPSS at 0.865% reflects no observed exploitation yet, but the low technical barrier (find the port, send a pickle payload) means weaponization is fast. Not in KEV, but treat this as pre-KEV.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
vLLM pip No patch
83.4K 130 dependents Pushed 3d ago 34% patched ~32d to patch Full package profile →
vLLM pip >= 0.6.5, < 0.8.5 0.8.5
83.4K 130 dependents Pushed 3d ago 34% patched ~32d to patch Full package profile →

How severe is it?

CVSS 3.1
9.8 / 10
EPSS
0.9%
chance of exploitation in 30 days
Higher than 56% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Moderate
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C High
I High
A High

What should I do?

5 steps
  1. PATCH

    Upgrade vLLM to 0.8.5 immediately — this is the only complete fix.

  2. WORKAROUND (if patching is delayed): Use host-level firewall rules (iptables/security groups) to restrict access to the KV cache port (--kv-ip target) to only trusted inference nodes.

  3. VERIFY

    Confirm which nodes run PyNcclPipe + V0 engine by checking launch configs for --kv-transfer-config with PyNcclPipe and absence of --enable-v1.

  4. DETECT

    Scan for unexpected connections to the TCPStore port from non-inference-cluster IPs. Check for unusual process spawning from vLLM worker processes.

  5. AUDIT

    Review cloud security group rules — the vLLM docs warned about network isolation but the default was insecure.

What does CISA's SSVC say?

Decision Track*
Exploitation none
Automatable Yes
Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2.6 - AI system security
NIST AI RMF
MANAGE 2.2 - Mechanisms are in place and applied to sustain the value of deployed AI systems
OWASP LLM Top 10
LLM03:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2025-47277?

If your organization runs vLLM in distributed mode with PyNcclPipe KV cache transfer and V0 engine, you have an unauthenticated RCE vulnerability reachable from the network — patch to 0.8.5 immediately. The TCPStore socket was binding to all interfaces instead of the private KV network, meaning any host that can reach that port can send a malicious pickle payload and execute arbitrary code. This is a 9.8 CVSS fire drill for any team running distributed LLM inference at scale.

Is CVE-2025-47277 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-47277, increasing the risk of exploitation.

How to fix CVE-2025-47277?

1. PATCH: Upgrade vLLM to 0.8.5 immediately — this is the only complete fix. 2. WORKAROUND (if patching is delayed): Use host-level firewall rules (iptables/security groups) to restrict access to the KV cache port (--kv-ip target) to only trusted inference nodes. 3. VERIFY: Confirm which nodes run PyNcclPipe + V0 engine by checking launch configs for --kv-transfer-config with PyNcclPipe and absence of --enable-v1. 4. DETECT: Scan for unexpected connections to the TCPStore port from non-inference-cluster IPs. Check for unusual process spawning from vLLM worker processes. 5. AUDIT: Review cloud security group rules — the vLLM docs warned about network isolation but the default was insecure.

What systems are affected by CVE-2025-47277?

This vulnerability affects the following AI/ML architecture patterns: distributed LLM inference, multi-node model serving, disaggregated prefill/decode inference, multi-GPU inference clusters.

What is the CVSS score for CVE-2025-47277?

CVE-2025-47277 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 0.93%.

What is the AI security impact?

Affected AI Architectures

distributed LLM inferencemulti-node model servingdisaggregated prefill/decode inferencemulti-GPU inference clusters

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0025 Exfiltration via Cyber Means
AML.T0049 Exploit Public-Facing Application
AML.T0072 Reverse Shell

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: A.6.2.6
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM03:2025

What are the technical details?

Original Advisory

vLLM, an inference and serving engine for large language models (LLMs), has an issue in versions 0.6.5 through 0.8.4 that ONLY impacts environments using the `PyNcclPipe` KV cache transfer integration with the V0 engine. No other configurations are affected. vLLM supports the use of the `PyNcclPipe` class to establish a peer-to-peer communication domain for data transmission between distributed nodes. The GPU-side KV-Cache transmission is implemented through the `PyNcclCommunicator` class, while CPU-side control message passing is handled via the `send_obj` and `recv_obj` methods on the CPU side.​ The intention was that this interface should only be exposed to a private network using the IP address specified by the `--kv-ip` CLI parameter. The vLLM documentation covers how this must be limited to a secured network. The default and intentional behavior from PyTorch is that the `TCPStore` interface listens on ALL interfaces, regardless of what IP address is provided. The IP address given was only used as a client-side address to use. vLLM was fixed to use a workaround to force the `TCPStore` instance to bind its socket to a specified private interface. As of version 0.8.5, vLLM limits the `TCPStore` socket to the private interface as configured.

Exploitation Scenario

Attacker scans for organizations running vLLM (job postings, GitHub repos, API fingerprinting). They identify a distributed inference cluster where the TCPStore port is reachable (misconfigured security group or internal network access via another compromise). Using PyTorch's distributed communication protocol, they connect to the exposed TCPStore and send a maliciously crafted serialized Python object via the recv_obj/send_obj interface. PyTorch deserializes the pickle payload, executing arbitrary code on the inference worker node. From there, the attacker can exfiltrate model weights, read KV cache contents containing other users' prompts, establish persistence, or pivot deeper into the ML infrastructure.

Weaknesses (CWE)

CWE-502 — Deserialization of Untrusted Data: The product deserializes untrusted data without sufficiently ensuring that the resulting data will be valid.

  • [Architecture and Design, Implementation] If available, use the signing/sealing features of the programming language to assure that deserialized data has not been tainted. For example, a hash-based message authentication code (HMAC) could be used to ensure that data has not been modified.
  • [Implementation] When deserializing data, populate a new object rather than just deserializing. The result is that the data flows through safe input validation and that the functions are safe.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
May 20, 2025
Last Modified
August 13, 2025
First Seen
May 20, 2025

Related Vulnerabilities