CVE-2024-9053: vllm: RCE via unsafe pickle deserialization in RPC server

GHSA-cj47-qj6g-x7r4 CRITICAL PoC AVAILABLE CISA: ATTEND
Published March 20, 2025
CISO Take

Any vLLM deployment running version ≤0.6.0 with the AsyncEngineRPCServer accessible from untrusted networks is critically vulnerable to unauthenticated remote code execution—an attacker only needs network access to the RPC port to fully own the inference server. Immediately firewall the RPC port (default 5570) and audit whether your LLM serving infrastructure is reachable from untrusted segments. Upgrade to a patched vLLM release as soon as one is available.

What is the risk?

Critical risk for any organization running vLLM in production. CVSS 9.8 with no authentication, no user interaction, and network-level access make this trivially exploitable by any attacker with connectivity to the RPC port. The EPSS of 0.02 suggests limited active exploitation at disclosure time, but the attack surface is straightforward—cloudpickle deserialization RCE requires no AI/ML knowledge, just a crafted payload. LLM inference servers typically run with elevated privileges and hold model weights, API keys, and access to downstream data systems, dramatically amplifying blast radius beyond the initial foothold.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
vLLM pip No patch
83.4K 130 dependents Pushed 2d ago 34% patched ~32d to patch Full package profile →
vLLM pip <= 0.6.0 No patch
83.4K 130 dependents Pushed 2d ago 34% patched ~32d to patch Full package profile →

How severe is it?

CVSS 3.1
9.8 / 10
EPSS
1.3%
chance of exploitation in 30 days
Higher than 66% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
CISA SSVC: Public PoC
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C High
I High
A High

What should I do?

5 steps
  1. IMMEDIATE

    Block vLLM RPC port (default 5570) at the firewall—this interface must never be reachable from untrusted networks.

  2. PATCH

    Upgrade vLLM beyond 0.6.0; monitor the vLLM GitHub releases page for a patched version as none was available at disclosure time.

  3. NETWORK SEGMENTATION

    Place all inference servers in isolated network segments accessible only from trusted orchestration services.

  4. DETECTION

    Alert on unexpected child process spawning from vLLM processes and anomalous outbound connections from inference hosts—both are indicators of post-exploitation activity following pickle deserialization.

  5. AUDIT

    Verify whether AsyncEngineRPCServer is actually required in your deployment; disable it in the configuration if distributed/multi-GPU inference is not needed.

What does CISA's SSVC say?

Decision Attend
Exploitation poc
Automatable Yes
Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Art. 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2.6 - AI system security and cybersecurity controls
NIST AI RMF
MANAGE 2.2 - Mechanisms are in place and applied to sustain the value of deployed AI systems
OWASP LLM Top 10
LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2024-9053?

Any vLLM deployment running version ≤0.6.0 with the AsyncEngineRPCServer accessible from untrusted networks is critically vulnerable to unauthenticated remote code execution—an attacker only needs network access to the RPC port to fully own the inference server. Immediately firewall the RPC port (default 5570) and audit whether your LLM serving infrastructure is reachable from untrusted segments. Upgrade to a patched vLLM release as soon as one is available.

Is CVE-2024-9053 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2024-9053, increasing the risk of exploitation.

How to fix CVE-2024-9053?

1. IMMEDIATE: Block vLLM RPC port (default 5570) at the firewall—this interface must never be reachable from untrusted networks. 2. PATCH: Upgrade vLLM beyond 0.6.0; monitor the vLLM GitHub releases page for a patched version as none was available at disclosure time. 3. NETWORK SEGMENTATION: Place all inference servers in isolated network segments accessible only from trusted orchestration services. 4. DETECTION: Alert on unexpected child process spawning from vLLM processes and anomalous outbound connections from inference hosts—both are indicators of post-exploitation activity following pickle deserialization. 5. AUDIT: Verify whether AsyncEngineRPCServer is actually required in your deployment; disable it in the configuration if distributed/multi-GPU inference is not needed.

What systems are affected by CVE-2024-9053?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, distributed model serving, model serving, AI API endpoints, RAG pipelines.

What is the CVSS score for CVE-2024-9053?

CVE-2024-9053 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 1.27%.

What is the AI security impact?

Affected AI Architectures

LLM inference servingdistributed model servingmodel servingAI API endpointsRAG pipelines

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0040 AI Model Inference API Access
AML.T0049 Exploit Public-Facing Application
AML.T0050 Command and Scripting Interpreter
AML.T0072 Reverse Shell

Compliance Controls Affected

EU AI Act: Art. 15
ISO 42001: A.6.2.6
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM05

What are the technical details?

Original Advisory

vllm-project vllm version 0.6.0 contains a vulnerability in the AsyncEngineRPCServer() RPC server entrypoints. The core functionality run_server_loop() calls the function _make_handler_coro(), which directly uses cloudpickle.loads() on received messages without any sanitization. This can result in remote code execution by deserializing malicious pickle data.

Exploitation Scenario

An adversary scans for or discovers an exposed vLLM RPC endpoint (default port 5570/TCP). Using publicly documented cloudpickle exploitation techniques, they craft a malicious serialized payload containing a reverse shell or arbitrary OS command and send it directly to the AsyncEngineRPCServer. The server passes the raw bytes to cloudpickle.loads() with no validation, immediately executing the attacker's payload with the privileges of the vLLM process—typically root or a high-privileged service account in containerized deployments. From this foothold, the attacker can exfiltrate model weights and API secrets, inject manipulated responses into the live inference pipeline, pivot to connected RAG databases and orchestration systems, or commandeer GPU resources. No credentials, tokens, or prior knowledge of the target environment are required.

Weaknesses (CWE)

CWE-502 — Deserialization of Untrusted Data: The product deserializes untrusted data without sufficiently ensuring that the resulting data will be valid.

  • [Architecture and Design, Implementation] If available, use the signing/sealing features of the programming language to assure that deserialized data has not been tainted. For example, a hash-based message authentication code (HMAC) could be used to ensure that data has not been modified.
  • [Implementation] When deserializing data, populate a new object rather than just deserializing. The result is that the data flows through safe input validation and that the functions are safe.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
March 20, 2025
Last Modified
October 15, 2025
First Seen
March 20, 2025

Related Vulnerabilities