vLLM's distributed inference API deserializes received bytes with pickle.loads() without sanitization, enabling unauthenticated remote code execution against any exposed distributed worker node. If your organization runs multi-GPU or multi-node vLLM deployments, audit network exposure of all inter-process communication ports immediately — weak cloud or Kubernetes segmentation converts 'internal-only' APIs into public attack surfaces. No upstream patch exists for versions up to 0.8.1; enforce strict network controls now.
Risk Assessment
CVSS 9.8 (Critical) but EPSS 0.00327 indicates no confirmed active exploitation as of enrichment date. The vLLM maintainer disputes that this constitutes a vulnerability in standard deployments, arguing internal distributed APIs are not designed for network exposure. However, in Kubernetes, cloud multi-tenant environments, and misconfigured on-premises clusters, these ports are routinely accessible. Organizations running distributed vLLM (multi-GPU tensor parallelism, pipeline parallelism) are at genuine critical risk if network segmentation is not explicitly enforced. Not in CISA KEV. Risk is elevated for AI infrastructure operators and reduced for single-GPU or containerized single-node deployments.
Affected Systems
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| vllm | pip | <= 0.8.1 | No patch |
Do you use vllm? You're affected.
Severity & Risk
Attack Surface
Recommended Action
6 steps-
IMMEDIATE
Audit all firewall rules and security groups — vLLM distributed ports (default Ray ports 6379, 8265, plus NCCL ephemeral ports) must be unreachable from any untrusted network segment.
-
Enforce network policies in Kubernetes to restrict pod-to-pod communication to only required vLLM worker peers.
-
No patch available for <= 0.8.1; monitor vllm-project/vllm GitHub releases for a fix and subscribe to GHSA-pgr7-mhp5-fgjp advisories.
-
As a workaround, wrap vLLM distributed workers behind a VPC/private subnet with no external routing.
-
Detection: alert on unexpected TCP connections to Ray/NCCL communication ports from non-worker source IPs; monitor for anomalous subprocess spawning from vLLM worker processes.
-
If patching is blocked, consider running vLLM in single-node mode to eliminate the distributed API attack surface until a fix lands.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2024-9052?
vLLM's distributed inference API deserializes received bytes with pickle.loads() without sanitization, enabling unauthenticated remote code execution against any exposed distributed worker node. If your organization runs multi-GPU or multi-node vLLM deployments, audit network exposure of all inter-process communication ports immediately — weak cloud or Kubernetes segmentation converts 'internal-only' APIs into public attack surfaces. No upstream patch exists for versions up to 0.8.1; enforce strict network controls now.
Is CVE-2024-9052 actively exploited?
No confirmed active exploitation of CVE-2024-9052 has been reported, but organizations should still patch proactively.
How to fix CVE-2024-9052?
1. IMMEDIATE: Audit all firewall rules and security groups — vLLM distributed ports (default Ray ports 6379, 8265, plus NCCL ephemeral ports) must be unreachable from any untrusted network segment. 2. Enforce network policies in Kubernetes to restrict pod-to-pod communication to only required vLLM worker peers. 3. No patch available for <= 0.8.1; monitor vllm-project/vllm GitHub releases for a fix and subscribe to GHSA-pgr7-mhp5-fgjp advisories. 4. As a workaround, wrap vLLM distributed workers behind a VPC/private subnet with no external routing. 5. Detection: alert on unexpected TCP connections to Ray/NCCL communication ports from non-worker source IPs; monitor for anomalous subprocess spawning from vLLM worker processes. 6. If patching is blocked, consider running vLLM in single-node mode to eliminate the distributed API attack surface until a fix lands.
What systems are affected by CVE-2024-9052?
This vulnerability affects the following AI/ML architecture patterns: distributed inference clusters, multi-GPU model serving, tensor-parallel LLM deployments, pipeline-parallel LLM deployments, Kubernetes ML workloads.
What is the CVSS score for CVE-2024-9052?
CVE-2024-9052 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 0.33%.
Technical Details
NVD Description
vllm-project vllm version 0.6.0 contains a vulnerability in the distributed training API. The function vllm.distributed.GroupCoordinator.recv_object() deserializes received object bytes using pickle.loads() without sanitization, leading to a remote code execution vulnerability. ### Maintainer perspective Note that vLLM does NOT use the code as described in the report on huntr. The problem only exists if you use these internal APIs in a way that exposes them to a network as described. The vllm team was not involved in the analysis of this report and the decision to assign it a CVE.
Exploitation Scenario
An adversary with access to the same network segment as a multi-node vLLM cluster — achieved via phishing and lateral movement, a misconfigured cloud security group, or a compromised Kubernetes namespace — scans for open Ray or vLLM coordination ports. They craft a malicious pickle payload (trivial using standard Python tooling) that executes a reverse shell command and send it to the GroupCoordinator's recv_object() endpoint. The payload deserializes without any validation and runs with the privileges of the vLLM worker process — typically with direct GPU access, all environment variables including cloud and model registry API keys, and network access to the full ML infrastructure. From this foothold the attacker exfiltrates model weights, harvests credentials for cloud storage buckets containing training data, and potentially poisons future model serving by injecting adversarial content into the inference pipeline.
Weaknesses (CWE)
CVSS Vector
CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H References
- github.com/advisories/GHSA-pgr7-mhp5-fgjp
- github.com/github/advisory-database/pull/5444
- github.com/vllm-project/vllm/blob/32e7db25365415841ebc7c4215851743fbb1bad1/vllm/distributed/parallel_state.py
- github.com/vllm-project/vllm/blob/v0.8.1/vllm/distributed/parallel_state.py
- huntr.com/bounties/ea75728f-4efe-4a3d-9f53-33f2c908e9f8
- nvd.nist.gov/vuln/detail/CVE-2024-9052
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm
AI Threat Alert