CVE-2025-47277 — CRITICAL (CVSS 9.8) AI Security Vulnerability

Q: Is CVE-2025-47277 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-47277, increasing the risk of exploitation.

Q: How to fix CVE-2025-47277?

1. PATCH: Upgrade vLLM to 0.8.5 immediately — this is the only complete fix. 2. WORKAROUND (if patching is delayed): Use host-level firewall rules (iptables/security groups) to restrict access to the KV cache port (--kv-ip target) to only trusted inference nodes. 3. VERIFY: Confirm which nodes run PyNcclPipe + V0 engine by checking launch configs for --kv-transfer-config with PyNcclPipe and absence of --enable-v1. 4. DETECT: Scan for unexpected connections to the TCPStore port from non-inference-cluster IPs. Check for unusual process spawning from vLLM worker processes. 5. AUDIT: Review cloud security group rules — the vLLM docs warned about network isolation but the default was insecure.

Q: What systems are affected by CVE-2025-47277?

This vulnerability affects the following AI/ML architecture patterns: distributed LLM inference, multi-node model serving, disaggregated prefill/decode inference, multi-GPU inference clusters.

Q: What is the CVSS score for CVE-2025-47277?

CVE-2025-47277 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 0.86%.

CISO Take

If your organization runs vLLM in distributed mode with PyNcclPipe KV cache transfer and V0 engine, you have an unauthenticated RCE vulnerability reachable from the network — patch to 0.8.5 immediately. The TCPStore socket was binding to all interfaces instead of the private KV network, meaning any host that can reach that port can send a malicious pickle payload and execute arbitrary code. This is a 9.8 CVSS fire drill for any team running distributed LLM inference at scale.

Risk Assessment

Severity is effectively maximum for affected configurations: no authentication, no user interaction, network-exploitable, and CWE-502 deserialization means RCE is the likely outcome. Scope is narrow — only PyNcclPipe + V0 engine users — but that covers high-value targets: orgs running distributed multi-GPU or multi-node vLLM inference, which are typically the largest and most sensitive deployments. EPSS at 0.865% reflects no observed exploitation yet, but the low technical barrier (find the port, send a pickle payload) means weaponization is fast. Not in KEV, but treat this as pre-KEV.

Affected Systems

Package	Ecosystem	Vulnerable Range	Patched
vllm	pip	—	No patch
79.5K 127 dependents Pushed today 56% patched ~32d to patch Full package profile →
vllm	pip	>= 0.6.5, < 0.8.5	`0.8.5`
79.5K 127 dependents Pushed today 56% patched ~32d to patch Full package profile →

Severity & Risk

CVSS 3.1

9.8 / 10

EPSS

0.9%

chance of exploitation in 30 days

Higher than 75% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Moderate

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV Network

AC Low

PR None

UI None

S Unchanged

C High

I High

A High

Recommended Action

5 steps

PATCH

Upgrade vLLM to 0.8.5 immediately — this is the only complete fix.
WORKAROUND (if patching is delayed): Use host-level firewall rules (iptables/security groups) to restrict access to the KV cache port (--kv-ip target) to only trusted inference nodes.
VERIFY

Confirm which nodes run PyNcclPipe + V0 engine by checking launch configs for --kv-transfer-config with PyNcclPipe and absence of --enable-v1.
DETECT

Scan for unexpected connections to the TCPStore port from non-inference-cluster IPs. Check for unusual process spawning from vLLM worker processes.
AUDIT

Review cloud security group rules — the vLLM docs warned about network isolation but the default was insecure.

CISA SSVC Assessment

Decision Track*

Exploitation none

Automatable Yes

Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Code Execution Data Extraction Inference Framework AML.T0010.001 - AI Software AML.T0025 - Exfiltration via Cyber Means AML.T0049 - Exploit Public-Facing Application AML.T0072 - Reverse Shell

Compliance Impact

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.6.2.6 - AI system security

NIST AI RMF

MANAGE 2.2 - Mechanisms are in place and applied to sustain the value of deployed AI systems

OWASP LLM Top 10

LLM03:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2025-47277?

If your organization runs vLLM in distributed mode with PyNcclPipe KV cache transfer and V0 engine, you have an unauthenticated RCE vulnerability reachable from the network — patch to 0.8.5 immediately. The TCPStore socket was binding to all interfaces instead of the private KV network, meaning any host that can reach that port can send a malicious pickle payload and execute arbitrary code. This is a 9.8 CVSS fire drill for any team running distributed LLM inference at scale.

Is CVE-2025-47277 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-47277, increasing the risk of exploitation.

How to fix CVE-2025-47277?

1. PATCH: Upgrade vLLM to 0.8.5 immediately — this is the only complete fix. 2. WORKAROUND (if patching is delayed): Use host-level firewall rules (iptables/security groups) to restrict access to the KV cache port (--kv-ip target) to only trusted inference nodes. 3. VERIFY: Confirm which nodes run PyNcclPipe + V0 engine by checking launch configs for --kv-transfer-config with PyNcclPipe and absence of --enable-v1. 4. DETECT: Scan for unexpected connections to the TCPStore port from non-inference-cluster IPs. Check for unusual process spawning from vLLM worker processes. 5. AUDIT: Review cloud security group rules — the vLLM docs warned about network isolation but the default was insecure.

What systems are affected by CVE-2025-47277?

This vulnerability affects the following AI/ML architecture patterns: distributed LLM inference, multi-node model serving, disaggregated prefill/decode inference, multi-GPU inference clusters.

What is the CVSS score for CVE-2025-47277?

CVE-2025-47277 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 0.86%.

Technical Details

NVD Description

vLLM, an inference and serving engine for large language models (LLMs), has an issue in versions 0.6.5 through 0.8.4 that ONLY impacts environments using the `PyNcclPipe` KV cache transfer integration with the V0 engine. No other configurations are affected. vLLM supports the use of the `PyNcclPipe` class to establish a peer-to-peer communication domain for data transmission between distributed nodes. The GPU-side KV-Cache transmission is implemented through the `PyNcclCommunicator` class, while CPU-side control message passing is handled via the `send_obj` and `recv_obj` methods on the CPU side. The intention was that this interface should only be exposed to a private network using the IP address specified by the `--kv-ip` CLI parameter. The vLLM documentation covers how this must be limited to a secured network. The default and intentional behavior from PyTorch is that the `TCPStore` interface listens on ALL interfaces, regardless of what IP address is provided. The IP address given was only used as a client-side address to use. vLLM was fixed to use a workaround to force the `TCPStore` instance to bind its socket to a specified private interface. As of version 0.8.5, vLLM limits the `TCPStore` socket to the private interface as configured.

Exploitation Scenario

Attacker scans for organizations running vLLM (job postings, GitHub repos, API fingerprinting). They identify a distributed inference cluster where the TCPStore port is reachable (misconfigured security group or internal network access via another compromise). Using PyTorch's distributed communication protocol, they connect to the exposed TCPStore and send a maliciously crafted serialized Python object via the recv_obj/send_obj interface. PyTorch deserializes the pickle payload, executing arbitrary code on the inference worker node. From there, the attacker can exfiltrate model weights, read KV cache contents containing other users' prompts, establish persistence, or pivot deeper into the ML infrastructure.