CVE-2024-9052 — CRITICAL (CVSS 9.8) AI Security Vulnerability

Q: Is CVE-2024-9052 actively exploited?

No confirmed active exploitation of CVE-2024-9052 has been reported, but organizations should still patch proactively.

Q: How to fix CVE-2024-9052?

1. IMMEDIATE: Audit all firewall rules and security groups — vLLM distributed ports (default Ray ports 6379, 8265, plus NCCL ephemeral ports) must be unreachable from any untrusted network segment. 2. Enforce network policies in Kubernetes to restrict pod-to-pod communication to only required vLLM worker peers. 3. No patch available for <= 0.8.1; monitor vllm-project/vllm GitHub releases for a fix and subscribe to GHSA-pgr7-mhp5-fgjp advisories. 4. As a workaround, wrap vLLM distributed workers behind a VPC/private subnet with no external routing. 5. Detection: alert on unexpected TCP connections to Ray/NCCL communication ports from non-worker source IPs; monitor for anomalous subprocess spawning from vLLM worker processes. 6. If patching is blocked, consider running vLLM in single-node mode to eliminate the distributed API attack surface until a fix lands.

Q: What systems are affected by CVE-2024-9052?

This vulnerability affects the following AI/ML architecture patterns: distributed inference clusters, multi-GPU model serving, tensor-parallel LLM deployments, pipeline-parallel LLM deployments, Kubernetes ML workloads.

Q: What is the CVSS score for CVE-2024-9052?

CVE-2024-9052 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 0.33%.

CISO Take

vLLM's distributed inference API deserializes received bytes with pickle.loads() without sanitization, enabling unauthenticated remote code execution against any exposed distributed worker node. If your organization runs multi-GPU or multi-node vLLM deployments, audit network exposure of all inter-process communication ports immediately — weak cloud or Kubernetes segmentation converts 'internal-only' APIs into public attack surfaces. No upstream patch exists for versions up to 0.8.1; enforce strict network controls now.

Risk Assessment

CVSS 9.8 (Critical) but EPSS 0.00327 indicates no confirmed active exploitation as of enrichment date. The vLLM maintainer disputes that this constitutes a vulnerability in standard deployments, arguing internal distributed APIs are not designed for network exposure. However, in Kubernetes, cloud multi-tenant environments, and misconfigured on-premises clusters, these ports are routinely accessible. Organizations running distributed vLLM (multi-GPU tensor parallelism, pipeline parallelism) are at genuine critical risk if network segmentation is not explicitly enforced. Not in CISA KEV. Risk is elevated for AI infrastructure operators and reduced for single-GPU or containerized single-node deployments.

Affected Systems

Package	Ecosystem	Vulnerable Range	Patched
vllm	pip	<= 0.8.1	No patch
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →

Do you use vllm? You're affected.

Severity & Risk

CVSS 3.1

9.8 / 10

EPSS

0.3%

chance of exploitation in 30 days

Higher than 53% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

No known exploitation

Sophistication

Moderate

Attack Surface

AV Network

AC Low

PR None

UI None

S Unchanged

C High

I High

A High

Recommended Action

6 steps

IMMEDIATE

Audit all firewall rules and security groups — vLLM distributed ports (default Ray ports 6379, 8265, plus NCCL ephemeral ports) must be unreachable from any untrusted network segment.
Enforce network policies in Kubernetes to restrict pod-to-pod communication to only required vLLM worker peers.
No patch available for <= 0.8.1; monitor vllm-project/vllm GitHub releases for a fix and subscribe to GHSA-pgr7-mhp5-fgjp advisories.
As a workaround, wrap vLLM distributed workers behind a VPC/private subnet with no external routing.
Detection: alert on unexpected TCP connections to Ray/NCCL communication ports from non-worker source IPs; monitor for anomalous subprocess spawning from vLLM worker processes.
If patching is blocked, consider running vLLM in single-node mode to eliminate the distributed API attack surface until a fix lands.

Classification

Code Execution Supply Chain Framework Inference AML.T0006 - Active Scanning AML.T0049 - Exploit Public-Facing Application AML.T0072 - Reverse Shell

Compliance Impact

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.9.4 - Technical vulnerability management for AI systems

NIST AI RMF

MANAGE 2.2 - Risk response for AI vulnerabilities

OWASP LLM Top 10

LLM05:2025 - Insecure Plugin Design

Frequently Asked Questions

What is CVE-2024-9052?

vLLM's distributed inference API deserializes received bytes with pickle.loads() without sanitization, enabling unauthenticated remote code execution against any exposed distributed worker node. If your organization runs multi-GPU or multi-node vLLM deployments, audit network exposure of all inter-process communication ports immediately — weak cloud or Kubernetes segmentation converts 'internal-only' APIs into public attack surfaces. No upstream patch exists for versions up to 0.8.1; enforce strict network controls now.

Is CVE-2024-9052 actively exploited?

No confirmed active exploitation of CVE-2024-9052 has been reported, but organizations should still patch proactively.

How to fix CVE-2024-9052?

1. IMMEDIATE: Audit all firewall rules and security groups — vLLM distributed ports (default Ray ports 6379, 8265, plus NCCL ephemeral ports) must be unreachable from any untrusted network segment. 2. Enforce network policies in Kubernetes to restrict pod-to-pod communication to only required vLLM worker peers. 3. No patch available for <= 0.8.1; monitor vllm-project/vllm GitHub releases for a fix and subscribe to GHSA-pgr7-mhp5-fgjp advisories. 4. As a workaround, wrap vLLM distributed workers behind a VPC/private subnet with no external routing. 5. Detection: alert on unexpected TCP connections to Ray/NCCL communication ports from non-worker source IPs; monitor for anomalous subprocess spawning from vLLM worker processes. 6. If patching is blocked, consider running vLLM in single-node mode to eliminate the distributed API attack surface until a fix lands.

What systems are affected by CVE-2024-9052?

This vulnerability affects the following AI/ML architecture patterns: distributed inference clusters, multi-GPU model serving, tensor-parallel LLM deployments, pipeline-parallel LLM deployments, Kubernetes ML workloads.

What is the CVSS score for CVE-2024-9052?

CVE-2024-9052 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 0.33%.

Technical Details

NVD Description

vllm-project vllm version 0.6.0 contains a vulnerability in the distributed training API. The function vllm.distributed.GroupCoordinator.recv_object() deserializes received object bytes using pickle.loads() without sanitization, leading to a remote code execution vulnerability. ### Maintainer perspective Note that vLLM does NOT use the code as described in the report on huntr. The problem only exists if you use these internal APIs in a way that exposes them to a network as described. The vllm team was not involved in the analysis of this report and the decision to assign it a CVE.

Exploitation Scenario

An adversary with access to the same network segment as a multi-node vLLM cluster — achieved via phishing and lateral movement, a misconfigured cloud security group, or a compromised Kubernetes namespace — scans for open Ray or vLLM coordination ports. They craft a malicious pickle payload (trivial using standard Python tooling) that executes a reverse shell command and send it to the GroupCoordinator's recv_object() endpoint. The payload deserializes without any validation and runs with the privileges of the vLLM worker process — typically with direct GPU access, all environment variables including cloud and model registry API keys, and network access to the full ML infrastructure. From this foothold the attacker exfiltrates model weights, harvests credentials for cloud storage buckets containing training data, and potentially poisons future model serving by injecting adversarial content into the inference pipeline.