CVE-2024-9053: vllm: RCE via unsafe pickle deserialization in RPC server
GHSA-cj47-qj6g-x7r4 CRITICAL PoC AVAILABLE CISA: ATTENDAny vLLM deployment running version ≤0.6.0 with the AsyncEngineRPCServer accessible from untrusted networks is critically vulnerable to unauthenticated remote code execution—an attacker only needs network access to the RPC port to fully own the inference server. Immediately firewall the RPC port (default 5570) and audit whether your LLM serving infrastructure is reachable from untrusted segments. Upgrade to a patched vLLM release as soon as one is available.
Risk Assessment
Critical risk for any organization running vLLM in production. CVSS 9.8 with no authentication, no user interaction, and network-level access make this trivially exploitable by any attacker with connectivity to the RPC port. The EPSS of 0.02 suggests limited active exploitation at disclosure time, but the attack surface is straightforward—cloudpickle deserialization RCE requires no AI/ML knowledge, just a crafted payload. LLM inference servers typically run with elevated privileges and hold model weights, API keys, and access to downstream data systems, dramatically amplifying blast radius beyond the initial foothold.
Affected Systems
Severity & Risk
Attack Surface
Recommended Action
5 steps-
IMMEDIATE
Block vLLM RPC port (default 5570) at the firewall—this interface must never be reachable from untrusted networks.
-
PATCH
Upgrade vLLM beyond 0.6.0; monitor the vLLM GitHub releases page for a patched version as none was available at disclosure time.
-
NETWORK SEGMENTATION
Place all inference servers in isolated network segments accessible only from trusted orchestration services.
-
DETECTION
Alert on unexpected child process spawning from vLLM processes and anomalous outbound connections from inference hosts—both are indicators of post-exploitation activity following pickle deserialization.
-
AUDIT
Verify whether AsyncEngineRPCServer is actually required in your deployment; disable it in the configuration if distributed/multi-GPU inference is not needed.
CISA SSVC Assessment
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2024-9053?
Any vLLM deployment running version ≤0.6.0 with the AsyncEngineRPCServer accessible from untrusted networks is critically vulnerable to unauthenticated remote code execution—an attacker only needs network access to the RPC port to fully own the inference server. Immediately firewall the RPC port (default 5570) and audit whether your LLM serving infrastructure is reachable from untrusted segments. Upgrade to a patched vLLM release as soon as one is available.
Is CVE-2024-9053 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2024-9053, increasing the risk of exploitation.
How to fix CVE-2024-9053?
1. IMMEDIATE: Block vLLM RPC port (default 5570) at the firewall—this interface must never be reachable from untrusted networks. 2. PATCH: Upgrade vLLM beyond 0.6.0; monitor the vLLM GitHub releases page for a patched version as none was available at disclosure time. 3. NETWORK SEGMENTATION: Place all inference servers in isolated network segments accessible only from trusted orchestration services. 4. DETECTION: Alert on unexpected child process spawning from vLLM processes and anomalous outbound connections from inference hosts—both are indicators of post-exploitation activity following pickle deserialization. 5. AUDIT: Verify whether AsyncEngineRPCServer is actually required in your deployment; disable it in the configuration if distributed/multi-GPU inference is not needed.
What systems are affected by CVE-2024-9053?
This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, distributed model serving, model serving, AI API endpoints, RAG pipelines.
What is the CVSS score for CVE-2024-9053?
CVE-2024-9053 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 10.02%.
Technical Details
NVD Description
vllm-project vllm version 0.6.0 contains a vulnerability in the AsyncEngineRPCServer() RPC server entrypoints. The core functionality run_server_loop() calls the function _make_handler_coro(), which directly uses cloudpickle.loads() on received messages without any sanitization. This can result in remote code execution by deserializing malicious pickle data.
Exploitation Scenario
An adversary scans for or discovers an exposed vLLM RPC endpoint (default port 5570/TCP). Using publicly documented cloudpickle exploitation techniques, they craft a malicious serialized payload containing a reverse shell or arbitrary OS command and send it directly to the AsyncEngineRPCServer. The server passes the raw bytes to cloudpickle.loads() with no validation, immediately executing the attacker's payload with the privileges of the vLLM process—typically root or a high-privileged service account in containerized deployments. From this foothold, the attacker can exfiltrate model weights and API secrets, inject manipulated responses into the live inference pipeline, pivot to connected RAG databases and orchestration systems, or commandeer GPU resources. No credentials, tokens, or prior knowledge of the target environment are required.
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H References
Timeline
Related Vulnerabilities
CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2026-22807 9.8 vllm: Code Injection enables RCE
Same package: vllm
AI Threat Alert