CVE-2024-9053 — CRITICAL (CVSS 9.8) AI Security Vulnerability

Q: Is CVE-2024-9053 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2024-9053, increasing the risk of exploitation.

Q: How to fix CVE-2024-9053?

1. IMMEDIATE: Block vLLM RPC port (default 5570) at the firewall—this interface must never be reachable from untrusted networks. 2. PATCH: Upgrade vLLM beyond 0.6.0; monitor the vLLM GitHub releases page for a patched version as none was available at disclosure time. 3. NETWORK SEGMENTATION: Place all inference servers in isolated network segments accessible only from trusted orchestration services. 4. DETECTION: Alert on unexpected child process spawning from vLLM processes and anomalous outbound connections from inference hosts—both are indicators of post-exploitation activity following pickle deserialization. 5. AUDIT: Verify whether AsyncEngineRPCServer is actually required in your deployment; disable it in the configuration if distributed/multi-GPU inference is not needed.

Q: What systems are affected by CVE-2024-9053?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, distributed model serving, model serving, AI API endpoints, RAG pipelines.

Q: What is the CVSS score for CVE-2024-9053?

CVE-2024-9053 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 10.02%.

CISO Take

Any vLLM deployment running version ≤0.6.0 with the AsyncEngineRPCServer accessible from untrusted networks is critically vulnerable to unauthenticated remote code execution—an attacker only needs network access to the RPC port to fully own the inference server. Immediately firewall the RPC port (default 5570) and audit whether your LLM serving infrastructure is reachable from untrusted segments. Upgrade to a patched vLLM release as soon as one is available.

Risk Assessment

Critical risk for any organization running vLLM in production. CVSS 9.8 with no authentication, no user interaction, and network-level access make this trivially exploitable by any attacker with connectivity to the RPC port. The EPSS of 0.02 suggests limited active exploitation at disclosure time, but the attack surface is straightforward—cloudpickle deserialization RCE requires no AI/ML knowledge, just a crafted payload. LLM inference servers typically run with elevated privileges and hold model weights, API keys, and access to downstream data systems, dramatically amplifying blast radius beyond the initial foothold.

Affected Systems

Package	Ecosystem	Vulnerable Range	Patched
vllm	pip	—	No patch
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →
vllm	pip	<= 0.6.0	No patch
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →

Severity & Risk

CVSS 3.1

9.8 / 10

EPSS

10.0%

chance of exploitation in 30 days

Higher than 93% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Trivial

Exploitation Confidence

medium

○ CISA SSVC: Public PoC

○ Public PoC indexed (trickest/cve)

○ EPSS exploit prediction: 10%

Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV Network

AC Low

PR None

UI None

S Unchanged

C High

I High

A High

Recommended Action

5 steps

IMMEDIATE

Block vLLM RPC port (default 5570) at the firewall—this interface must never be reachable from untrusted networks.
PATCH

Upgrade vLLM beyond 0.6.0; monitor the vLLM GitHub releases page for a patched version as none was available at disclosure time.
NETWORK SEGMENTATION

Place all inference servers in isolated network segments accessible only from trusted orchestration services.
DETECTION

Alert on unexpected child process spawning from vLLM processes and anomalous outbound connections from inference hosts—both are indicators of post-exploitation activity following pickle deserialization.
AUDIT

Verify whether AsyncEngineRPCServer is actually required in your deployment; disable it in the configuration if distributed/multi-GPU inference is not needed.

CISA SSVC Assessment

Decision Attend

Exploitation poc

Automatable Yes

Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Code Execution Supply Chain Framework Inference AML.T0010.001 - AI Software AML.T0040 - AI Model Inference API Access AML.T0049 - Exploit Public-Facing Application AML.T0050 - Command and Scripting Interpreter AML.T0072 - Reverse Shell

Compliance Impact

This CVE is relevant to:

EU AI Act

Art. 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.6.2.6 - AI system security and cybersecurity controls

NIST AI RMF

MANAGE 2.2 - Mechanisms are in place and applied to sustain the value of deployed AI systems

OWASP LLM Top 10

LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2024-9053?

Any vLLM deployment running version ≤0.6.0 with the AsyncEngineRPCServer accessible from untrusted networks is critically vulnerable to unauthenticated remote code execution—an attacker only needs network access to the RPC port to fully own the inference server. Immediately firewall the RPC port (default 5570) and audit whether your LLM serving infrastructure is reachable from untrusted segments. Upgrade to a patched vLLM release as soon as one is available.

Is CVE-2024-9053 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2024-9053, increasing the risk of exploitation.

How to fix CVE-2024-9053?

1. IMMEDIATE: Block vLLM RPC port (default 5570) at the firewall—this interface must never be reachable from untrusted networks. 2. PATCH: Upgrade vLLM beyond 0.6.0; monitor the vLLM GitHub releases page for a patched version as none was available at disclosure time. 3. NETWORK SEGMENTATION: Place all inference servers in isolated network segments accessible only from trusted orchestration services. 4. DETECTION: Alert on unexpected child process spawning from vLLM processes and anomalous outbound connections from inference hosts—both are indicators of post-exploitation activity following pickle deserialization. 5. AUDIT: Verify whether AsyncEngineRPCServer is actually required in your deployment; disable it in the configuration if distributed/multi-GPU inference is not needed.

What systems are affected by CVE-2024-9053?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, distributed model serving, model serving, AI API endpoints, RAG pipelines.

What is the CVSS score for CVE-2024-9053?

CVE-2024-9053 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 10.02%.

Technical Details

NVD Description

vllm-project vllm version 0.6.0 contains a vulnerability in the AsyncEngineRPCServer() RPC server entrypoints. The core functionality run_server_loop() calls the function _make_handler_coro(), which directly uses cloudpickle.loads() on received messages without any sanitization. This can result in remote code execution by deserializing malicious pickle data.

Exploitation Scenario

An adversary scans for or discovers an exposed vLLM RPC endpoint (default port 5570/TCP). Using publicly documented cloudpickle exploitation techniques, they craft a malicious serialized payload containing a reverse shell or arbitrary OS command and send it directly to the AsyncEngineRPCServer. The server passes the raw bytes to cloudpickle.loads() with no validation, immediately executing the attacker's payload with the privileges of the vLLM process—typically root or a high-privileged service account in containerized deployments. From this foothold, the attacker can exfiltrate model weights and API secrets, inject manipulated responses into the live inference pipeline, pivot to connected RAG databases and orchestration systems, or commandeer GPU resources. No credentials, tokens, or prior knowledge of the target environment are required.