CVE-2024-11041: vllm: RCE via unsafe pickle deserialization in MessageQueue

GHSA-5vqr-wprc-cpp7 CRITICAL PoC AVAILABLE CISA: ATTEND
Published March 20, 2025
CISO Take

Any attacker with network access to a vllm v0.6.2 inference server can achieve full remote code execution with zero authentication required. This is trivially exploitable on one of the most widely deployed open-source LLM inference engines. Upgrade immediately; if patching is blocked, firewall the distributed MessageQueue ports to trusted hosts only.

What is the risk?

Critical risk for organizations running vllm for on-premises or private cloud LLM inference. CVSS 9.8 reflects the worst-case profile: network-exploitable, no authentication, no user interaction, full C/I/A compromise. vllm powers LLaMA, Mistral, Qwen, and similar deployments at scale. Multi-node and multi-GPU distributed inference configurations are most exposed since the MessageQueue is used for inter-process communication across nodes. EPSS of 1.25% suggests exploitation is not yet widespread, but the barrier is extremely low—any standard pickle payload generator produces a working exploit.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
vLLM pip No patch
83.4K 130 dependents Pushed 3d ago 34% patched ~32d to patch Full package profile →
vLLM pip <= 0.6.2 No patch
83.4K 130 dependents Pushed 3d ago 34% patched ~32d to patch Full package profile →

How severe is it?

CVSS 3.1
9.8 / 10
EPSS
1.4%
chance of exploitation in 30 days
Higher than 69% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
CISA SSVC: Public PoC
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C High
I High
A High

What should I do?

5 steps
  1. PATCH

    Upgrade vllm beyond v0.6.2—verify the fix is present in the target release.

  2. ISOLATE

    Restrict network access to vllm inter-process communication ports via firewall rules, namespace isolation, or VPC security groups to trusted hosts only.

  3. DETECT

    Monitor inference servers for unexpected outbound connections, new listening ports, and anomalous process spawning from vllm worker processes.

  4. AUDIT

    Review who has network-level access to vllm serving infrastructure and enforce least-privilege networking.

  5. WORKAROUND

    If immediate patching is blocked, wrap MessageQueue transport in an authenticated/signed layer or replace pickle with a safe serialization format (JSON, msgpack).

What does CISA's SSVC say?

Decision Attend
Exploitation poc
Automatable Yes
Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2.6 - AI system security controls
NIST AI RMF
MANAGE 2.4 - Residual risks are managed and monitored
OWASP LLM Top 10
LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2024-11041?

Any attacker with network access to a vllm v0.6.2 inference server can achieve full remote code execution with zero authentication required. This is trivially exploitable on one of the most widely deployed open-source LLM inference engines. Upgrade immediately; if patching is blocked, firewall the distributed MessageQueue ports to trusted hosts only.

Is CVE-2024-11041 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2024-11041, increasing the risk of exploitation.

How to fix CVE-2024-11041?

1. PATCH: Upgrade vllm beyond v0.6.2—verify the fix is present in the target release. 2. ISOLATE: Restrict network access to vllm inter-process communication ports via firewall rules, namespace isolation, or VPC security groups to trusted hosts only. 3. DETECT: Monitor inference servers for unexpected outbound connections, new listening ports, and anomalous process spawning from vllm worker processes. 4. AUDIT: Review who has network-level access to vllm serving infrastructure and enforce least-privilege networking. 5. WORKAROUND: If immediate patching is blocked, wrap MessageQueue transport in an authenticated/signed layer or replace pickle with a safe serialization format (JSON, msgpack).

What systems are affected by CVE-2024-11041?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, distributed multi-GPU inference, multi-node inference clusters, on-premises model serving, AI serving infrastructure.

What is the CVSS score for CVE-2024-11041?

CVE-2024-11041 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 1.41%.

What is the AI security impact?

Affected AI Architectures

LLM inference servingdistributed multi-GPU inferencemulti-node inference clusterson-premises model servingAI serving infrastructure

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0049 Exploit Public-Facing Application
AML.T0050 Command and Scripting Interpreter
AML.T0072 Reverse Shell

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: A.6.2.6
NIST AI RMF: MANAGE 2.4
OWASP LLM Top 10: LLM05

What are the technical details?

Original Advisory

vllm-project vllm version v0.6.2 contains a vulnerability in the MessageQueue.dequeue() API function. The function uses pickle.loads to parse received sockets directly, leading to a remote code execution vulnerability. An attacker can exploit this by sending a malicious payload to the MessageQueue, causing the victim's machine to execute arbitrary code.

Exploitation Scenario

An adversary with internal network access (lateral movement from compromised workstation, or exposed vllm endpoint) scans for the vllm MessageQueue socket. Using standard Python tooling (pickletools, pwntools) they craft a malicious pickle payload that spawns a reverse shell. They send the payload directly to the MessageQueue. When the vllm worker calls dequeue(), pickle.loads() executes the payload without any checks. The attacker lands on a GPU server with access to model weights, internal APIs, and cloud credentials in environment variables—enabling model exfiltration, lateral movement through the AI serving cluster, or persistent backdoor installation.

Weaknesses (CWE)

CWE-502 — Deserialization of Untrusted Data: The product deserializes untrusted data without sufficiently ensuring that the resulting data will be valid.

  • [Architecture and Design, Implementation] If available, use the signing/sealing features of the programming language to assure that deserialized data has not been tainted. For example, a hash-based message authentication code (HMAC) could be used to ensure that data has not been modified.
  • [Implementation] When deserializing data, populate a new object rather than just deserializing. The result is that the data flows through safe input validation and that the functions are safe.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
March 20, 2025
Last Modified
July 31, 2025
First Seen
March 20, 2025

Related Vulnerabilities