CVE-2024-9052: vLLM: RCE via pickle deserialization in distributed API

GHSA-pgr7-mhp5-fgjp CRITICAL
Published March 20, 2025
CISO Take

vLLM's distributed inference API deserializes received bytes with pickle.loads() without sanitization, enabling unauthenticated remote code execution against any exposed distributed worker node. If your organization runs multi-GPU or multi-node vLLM deployments, audit network exposure of all inter-process communication ports immediately — weak cloud or Kubernetes segmentation converts 'internal-only' APIs into public attack surfaces. No upstream patch exists for versions up to 0.8.1; enforce strict network controls now.

Risk Assessment

CVSS 9.8 (Critical) but EPSS 0.00327 indicates no confirmed active exploitation as of enrichment date. The vLLM maintainer disputes that this constitutes a vulnerability in standard deployments, arguing internal distributed APIs are not designed for network exposure. However, in Kubernetes, cloud multi-tenant environments, and misconfigured on-premises clusters, these ports are routinely accessible. Organizations running distributed vLLM (multi-GPU tensor parallelism, pipeline parallelism) are at genuine critical risk if network segmentation is not explicitly enforced. Not in CISA KEV. Risk is elevated for AI infrastructure operators and reduced for single-GPU or containerized single-node deployments.

Affected Systems

Package Ecosystem Vulnerable Range Patched
vllm pip <= 0.8.1 No patch
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →

Do you use vllm? You're affected.

Severity & Risk

CVSS 3.1
9.8 / 10
EPSS
0.3%
chance of exploitation in 30 days
Higher than 53% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Moderate

Attack Surface

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C High
I High
A High

Recommended Action

6 steps
  1. IMMEDIATE

    Audit all firewall rules and security groups — vLLM distributed ports (default Ray ports 6379, 8265, plus NCCL ephemeral ports) must be unreachable from any untrusted network segment.

  2. Enforce network policies in Kubernetes to restrict pod-to-pod communication to only required vLLM worker peers.

  3. No patch available for <= 0.8.1; monitor vllm-project/vllm GitHub releases for a fix and subscribe to GHSA-pgr7-mhp5-fgjp advisories.

  4. As a workaround, wrap vLLM distributed workers behind a VPC/private subnet with no external routing.

  5. Detection: alert on unexpected TCP connections to Ray/NCCL communication ports from non-worker source IPs; monitor for anomalous subprocess spawning from vLLM worker processes.

  6. If patching is blocked, consider running vLLM in single-node mode to eliminate the distributed API attack surface until a fix lands.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.9.4 - Technical vulnerability management for AI systems
NIST AI RMF
MANAGE 2.2 - Risk response for AI vulnerabilities
OWASP LLM Top 10
LLM05:2025 - Insecure Plugin Design

Frequently Asked Questions

What is CVE-2024-9052?

vLLM's distributed inference API deserializes received bytes with pickle.loads() without sanitization, enabling unauthenticated remote code execution against any exposed distributed worker node. If your organization runs multi-GPU or multi-node vLLM deployments, audit network exposure of all inter-process communication ports immediately — weak cloud or Kubernetes segmentation converts 'internal-only' APIs into public attack surfaces. No upstream patch exists for versions up to 0.8.1; enforce strict network controls now.

Is CVE-2024-9052 actively exploited?

No confirmed active exploitation of CVE-2024-9052 has been reported, but organizations should still patch proactively.

How to fix CVE-2024-9052?

1. IMMEDIATE: Audit all firewall rules and security groups — vLLM distributed ports (default Ray ports 6379, 8265, plus NCCL ephemeral ports) must be unreachable from any untrusted network segment. 2. Enforce network policies in Kubernetes to restrict pod-to-pod communication to only required vLLM worker peers. 3. No patch available for <= 0.8.1; monitor vllm-project/vllm GitHub releases for a fix and subscribe to GHSA-pgr7-mhp5-fgjp advisories. 4. As a workaround, wrap vLLM distributed workers behind a VPC/private subnet with no external routing. 5. Detection: alert on unexpected TCP connections to Ray/NCCL communication ports from non-worker source IPs; monitor for anomalous subprocess spawning from vLLM worker processes. 6. If patching is blocked, consider running vLLM in single-node mode to eliminate the distributed API attack surface until a fix lands.

What systems are affected by CVE-2024-9052?

This vulnerability affects the following AI/ML architecture patterns: distributed inference clusters, multi-GPU model serving, tensor-parallel LLM deployments, pipeline-parallel LLM deployments, Kubernetes ML workloads.

What is the CVSS score for CVE-2024-9052?

CVE-2024-9052 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 0.33%.

Technical Details

NVD Description

vllm-project vllm version 0.6.0 contains a vulnerability in the distributed training API. The function vllm.distributed.GroupCoordinator.recv_object() deserializes received object bytes using pickle.loads() without sanitization, leading to a remote code execution vulnerability. ### Maintainer perspective Note that vLLM does NOT use the code as described in the report on huntr. The problem only exists if you use these internal APIs in a way that exposes them to a network as described. The vllm team was not involved in the analysis of this report and the decision to assign it a CVE.

Exploitation Scenario

An adversary with access to the same network segment as a multi-node vLLM cluster — achieved via phishing and lateral movement, a misconfigured cloud security group, or a compromised Kubernetes namespace — scans for open Ray or vLLM coordination ports. They craft a malicious pickle payload (trivial using standard Python tooling) that executes a reverse shell command and send it to the GroupCoordinator's recv_object() endpoint. The payload deserializes without any validation and runs with the privileges of the vLLM worker process — typically with direct GPU access, all environment variables including cloud and model registry API keys, and network access to the full ML infrastructure. From this foothold the attacker exfiltrates model weights, harvests credentials for cloud storage buckets containing training data, and potentially poisons future model serving by injecting adversarial content into the inference pipeline.

CVSS Vector

CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
March 20, 2025
Last Modified
April 9, 2025
First Seen
March 24, 2026

Related Vulnerabilities