CVE-2024-11041 — CRITICAL (CVSS 9.8) AI Security Vulnerability

Q: Is CVE-2024-11041 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2024-11041, increasing the risk of exploitation.

Q: How to fix CVE-2024-11041?

1. PATCH: Upgrade vllm beyond v0.6.2—verify the fix is present in the target release. 2. ISOLATE: Restrict network access to vllm inter-process communication ports via firewall rules, namespace isolation, or VPC security groups to trusted hosts only. 3. DETECT: Monitor inference servers for unexpected outbound connections, new listening ports, and anomalous process spawning from vllm worker processes. 4. AUDIT: Review who has network-level access to vllm serving infrastructure and enforce least-privilege networking. 5. WORKAROUND: If immediate patching is blocked, wrap MessageQueue transport in an authenticated/signed layer or replace pickle with a safe serialization format (JSON, msgpack).

Q: What systems are affected by CVE-2024-11041?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, distributed multi-GPU inference, multi-node inference clusters, on-premises model serving, AI serving infrastructure.

Q: What is the CVSS score for CVE-2024-11041?

CVE-2024-11041 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 5.60%.

CISO Take

Any attacker with network access to a vllm v0.6.2 inference server can achieve full remote code execution with zero authentication required. This is trivially exploitable on one of the most widely deployed open-source LLM inference engines. Upgrade immediately; if patching is blocked, firewall the distributed MessageQueue ports to trusted hosts only.

Risk Assessment

Critical risk for organizations running vllm for on-premises or private cloud LLM inference. CVSS 9.8 reflects the worst-case profile: network-exploitable, no authentication, no user interaction, full C/I/A compromise. vllm powers LLaMA, Mistral, Qwen, and similar deployments at scale. Multi-node and multi-GPU distributed inference configurations are most exposed since the MessageQueue is used for inter-process communication across nodes. EPSS of 1.25% suggests exploitation is not yet widespread, but the barrier is extremely low—any standard pickle payload generator produces a working exploit.

Affected Systems

Package	Ecosystem	Vulnerable Range	Patched
vllm	pip	—	No patch
79.5K 127 dependents Pushed today 56% patched ~32d to patch Full package profile →
vllm	pip	<= 0.6.2	No patch
79.5K 127 dependents Pushed today 56% patched ~32d to patch Full package profile →

Severity & Risk

CVSS 3.1

9.8 / 10

EPSS

5.6%

chance of exploitation in 30 days

Higher than 90% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Trivial

Exploitation Confidence

medium

○ CISA SSVC: Public PoC

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV Network

AC Low

PR None

UI None

S Unchanged

C High

I High

A High

Recommended Action

5 steps

PATCH

Upgrade vllm beyond v0.6.2—verify the fix is present in the target release.
ISOLATE

Restrict network access to vllm inter-process communication ports via firewall rules, namespace isolation, or VPC security groups to trusted hosts only.
DETECT

Monitor inference servers for unexpected outbound connections, new listening ports, and anomalous process spawning from vllm worker processes.
AUDIT

Review who has network-level access to vllm serving infrastructure and enforce least-privilege networking.
WORKAROUND

If immediate patching is blocked, wrap MessageQueue transport in an authenticated/signed layer or replace pickle with a safe serialization format (JSON, msgpack).

CISA SSVC Assessment

Decision Attend

Exploitation poc

Automatable Yes

Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Code Execution Framework Inference AML.T0010.001 - AI Software AML.T0049 - Exploit Public-Facing Application AML.T0050 - Command and Scripting Interpreter AML.T0072 - Reverse Shell

Compliance Impact

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.6.2.6 - AI system security controls

NIST AI RMF

MANAGE 2.4 - Residual risks are managed and monitored

OWASP LLM Top 10

LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2024-11041?

Any attacker with network access to a vllm v0.6.2 inference server can achieve full remote code execution with zero authentication required. This is trivially exploitable on one of the most widely deployed open-source LLM inference engines. Upgrade immediately; if patching is blocked, firewall the distributed MessageQueue ports to trusted hosts only.

Is CVE-2024-11041 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2024-11041, increasing the risk of exploitation.

How to fix CVE-2024-11041?

1. PATCH: Upgrade vllm beyond v0.6.2—verify the fix is present in the target release. 2. ISOLATE: Restrict network access to vllm inter-process communication ports via firewall rules, namespace isolation, or VPC security groups to trusted hosts only. 3. DETECT: Monitor inference servers for unexpected outbound connections, new listening ports, and anomalous process spawning from vllm worker processes. 4. AUDIT: Review who has network-level access to vllm serving infrastructure and enforce least-privilege networking. 5. WORKAROUND: If immediate patching is blocked, wrap MessageQueue transport in an authenticated/signed layer or replace pickle with a safe serialization format (JSON, msgpack).

What systems are affected by CVE-2024-11041?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, distributed multi-GPU inference, multi-node inference clusters, on-premises model serving, AI serving infrastructure.

What is the CVSS score for CVE-2024-11041?

CVE-2024-11041 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 5.60%.

Technical Details

NVD Description

vllm-project vllm version v0.6.2 contains a vulnerability in the MessageQueue.dequeue() API function. The function uses pickle.loads to parse received sockets directly, leading to a remote code execution vulnerability. An attacker can exploit this by sending a malicious payload to the MessageQueue, causing the victim's machine to execute arbitrary code.

Exploitation Scenario

An adversary with internal network access (lateral movement from compromised workstation, or exposed vllm endpoint) scans for the vllm MessageQueue socket. Using standard Python tooling (pickletools, pwntools) they craft a malicious pickle payload that spawns a reverse shell. They send the payload directly to the MessageQueue. When the vllm worker calls dequeue(), pickle.loads() executes the payload without any checks. The attacker lands on a GPU server with access to model weights, internal APIs, and cloud credentials in environment variables—enabling model exfiltration, lateral movement through the AI serving cluster, or persistent backdoor installation.