CVE-2024-9053: vllm: RCE via unsafe pickle deserialization in RPC server

GHSA-cj47-qj6g-x7r4 CRITICAL PoC AVAILABLE CISA: ATTEND
Published March 20, 2025
CISO Take

Any vLLM deployment running version ≤0.6.0 with the AsyncEngineRPCServer accessible from untrusted networks is critically vulnerable to unauthenticated remote code execution—an attacker only needs network access to the RPC port to fully own the inference server. Immediately firewall the RPC port (default 5570) and audit whether your LLM serving infrastructure is reachable from untrusted segments. Upgrade to a patched vLLM release as soon as one is available.

Risk Assessment

Critical risk for any organization running vLLM in production. CVSS 9.8 with no authentication, no user interaction, and network-level access make this trivially exploitable by any attacker with connectivity to the RPC port. The EPSS of 0.02 suggests limited active exploitation at disclosure time, but the attack surface is straightforward—cloudpickle deserialization RCE requires no AI/ML knowledge, just a crafted payload. LLM inference servers typically run with elevated privileges and hold model weights, API keys, and access to downstream data systems, dramatically amplifying blast radius beyond the initial foothold.

Affected Systems

Package Ecosystem Vulnerable Range Patched
vllm pip No patch
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →
vllm pip <= 0.6.0 No patch
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →

Severity & Risk

CVSS 3.1
9.8 / 10
EPSS
10.0%
chance of exploitation in 30 days
Higher than 93% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
CISA SSVC: Public PoC
Public PoC indexed (trickest/cve)
EPSS exploit prediction: 10%
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C High
I High
A High

Recommended Action

5 steps
  1. IMMEDIATE

    Block vLLM RPC port (default 5570) at the firewall—this interface must never be reachable from untrusted networks.

  2. PATCH

    Upgrade vLLM beyond 0.6.0; monitor the vLLM GitHub releases page for a patched version as none was available at disclosure time.

  3. NETWORK SEGMENTATION

    Place all inference servers in isolated network segments accessible only from trusted orchestration services.

  4. DETECTION

    Alert on unexpected child process spawning from vLLM processes and anomalous outbound connections from inference hosts—both are indicators of post-exploitation activity following pickle deserialization.

  5. AUDIT

    Verify whether AsyncEngineRPCServer is actually required in your deployment; disable it in the configuration if distributed/multi-GPU inference is not needed.

CISA SSVC Assessment

Decision Attend
Exploitation poc
Automatable Yes
Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Art. 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2.6 - AI system security and cybersecurity controls
NIST AI RMF
MANAGE 2.2 - Mechanisms are in place and applied to sustain the value of deployed AI systems
OWASP LLM Top 10
LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2024-9053?

Any vLLM deployment running version ≤0.6.0 with the AsyncEngineRPCServer accessible from untrusted networks is critically vulnerable to unauthenticated remote code execution—an attacker only needs network access to the RPC port to fully own the inference server. Immediately firewall the RPC port (default 5570) and audit whether your LLM serving infrastructure is reachable from untrusted segments. Upgrade to a patched vLLM release as soon as one is available.

Is CVE-2024-9053 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2024-9053, increasing the risk of exploitation.

How to fix CVE-2024-9053?

1. IMMEDIATE: Block vLLM RPC port (default 5570) at the firewall—this interface must never be reachable from untrusted networks. 2. PATCH: Upgrade vLLM beyond 0.6.0; monitor the vLLM GitHub releases page for a patched version as none was available at disclosure time. 3. NETWORK SEGMENTATION: Place all inference servers in isolated network segments accessible only from trusted orchestration services. 4. DETECTION: Alert on unexpected child process spawning from vLLM processes and anomalous outbound connections from inference hosts—both are indicators of post-exploitation activity following pickle deserialization. 5. AUDIT: Verify whether AsyncEngineRPCServer is actually required in your deployment; disable it in the configuration if distributed/multi-GPU inference is not needed.

What systems are affected by CVE-2024-9053?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, distributed model serving, model serving, AI API endpoints, RAG pipelines.

What is the CVSS score for CVE-2024-9053?

CVE-2024-9053 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 10.02%.

Technical Details

NVD Description

vllm-project vllm version 0.6.0 contains a vulnerability in the AsyncEngineRPCServer() RPC server entrypoints. The core functionality run_server_loop() calls the function _make_handler_coro(), which directly uses cloudpickle.loads() on received messages without any sanitization. This can result in remote code execution by deserializing malicious pickle data.

Exploitation Scenario

An adversary scans for or discovers an exposed vLLM RPC endpoint (default port 5570/TCP). Using publicly documented cloudpickle exploitation techniques, they craft a malicious serialized payload containing a reverse shell or arbitrary OS command and send it directly to the AsyncEngineRPCServer. The server passes the raw bytes to cloudpickle.loads() with no validation, immediately executing the attacker's payload with the privileges of the vLLM process—typically root or a high-privileged service account in containerized deployments. From this foothold, the attacker can exfiltrate model weights and API secrets, inject manipulated responses into the live inference pipeline, pivot to connected RAG databases and orchestration systems, or commandeer GPU resources. No credentials, tokens, or prior knowledge of the target environment are required.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
March 20, 2025
Last Modified
October 15, 2025
First Seen
March 20, 2025

Related Vulnerabilities