CVE-2025-48944: vLLM: input validation DoS crashes inference worker
GHSA-vrq3-r879-7m65 MEDIUM PoC AVAILABLE CISA: TRACK*Any authenticated API user can crash a vLLM inference worker with a single malformed tools request, causing service outage until manually restarted. Low-privilege access requirement makes this a realistic insider or compromised-credential threat in multi-tenant LLM deployments. Patch to vLLM 0.9.0 immediately—there is no safe workaround short of disabling tools functionality entirely.
What is the risk?
Medium-high operational risk despite the CVSS 6.5 score. Attack complexity is low and only low-privilege API credentials are required, making exploitation trivial for any user with API access. The impact is total disruption of the affected inference worker with no automatic recovery—manual intervention required each time. Organizations running vLLM 0.8.x with the tools API exposed face persistent availability risk from any authenticated principal, internal or external.
What systems are affected?
How severe is it?
What is the attack surface?
What should I do?
1 step-
1) Upgrade vLLM to 0.9.0 immediately—this is the only complete fix. 2) If patching is blocked, restrict /v1/chat/completions tools functionality to trusted internal principals at the API gateway level, or disable it entirely. 3) Implement automatic worker restart via systemd watchdog, Kubernetes liveness probes, or equivalent to minimize MTTR if exploitation occurs. 4) Add API gateway or WAF rules to validate and reject structurally malformed 'pattern' and 'type' fields in tool definitions before they reach the inference worker. 5) Alert on unexpected vLLM worker process terminations—treat crashes as potential exploitation indicators.
What does CISA's SSVC say?
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2025-48944?
Any authenticated API user can crash a vLLM inference worker with a single malformed tools request, causing service outage until manually restarted. Low-privilege access requirement makes this a realistic insider or compromised-credential threat in multi-tenant LLM deployments. Patch to vLLM 0.9.0 immediately—there is no safe workaround short of disabling tools functionality entirely.
Is CVE-2025-48944 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2025-48944, increasing the risk of exploitation.
How to fix CVE-2025-48944?
1) Upgrade vLLM to 0.9.0 immediately—this is the only complete fix. 2) If patching is blocked, restrict /v1/chat/completions tools functionality to trusted internal principals at the API gateway level, or disable it entirely. 3) Implement automatic worker restart via systemd watchdog, Kubernetes liveness probes, or equivalent to minimize MTTR if exploitation occurs. 4) Add API gateway or WAF rules to validate and reject structurally malformed 'pattern' and 'type' fields in tool definitions before they reach the inference worker. 5) Alert on unexpected vLLM worker process terminations—treat crashes as potential exploitation indicators.
What systems are affected by CVE-2025-48944?
This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, agent frameworks, function-calling pipelines, model serving APIs, RAG pipelines with tool use.
What is the CVSS score for CVE-2025-48944?
CVE-2025-48944 has a CVSS v3.1 base score of 6.5 (MEDIUM). The EPSS exploitation probability is 0.45%.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0029 Denial of AI Service AML.T0034 Cost Harvesting AML.T0049 Exploit Public-Facing Application Compliance Controls Affected
What are the technical details?
Original Advisory
vLLM is an inference and serving engine for large language models (LLMs). In version 0.8.0 up to but excluding 0.9.0, the vLLM backend used with the /v1/chat/completions OpenAPI endpoint fails to validate unexpected or malformed input in the "pattern" and "type" fields when the tools functionality is invoked. These inputs are not validated before being compiled or parsed, causing a crash of the inference worker with a single request. The worker will remain down until it is restarted. Version 0.9.0 fixes the issue.
Exploitation Scenario
An attacker with basic API credentials—or a compromised low-privilege service account—sends a single POST request to /v1/chat/completions with a crafted 'tools' array containing a malformed 'pattern' field (e.g., an invalid regex that fails to compile) or a structurally invalid 'type' annotation. The vLLM backend attempts to compile or parse this input without prior validation, triggering an unhandled exception that terminates the inference worker process. The LLM service is fully unavailable until an operator manually restarts the worker. In a shared or multi-tenant LLM platform, a single request denies service to all downstream users of that worker. The attack requires no AI/ML expertise—only knowledge of the OpenAI-compatible tools API schema.
Weaknesses (CWE)
CWE-20 — Improper Input Validation: The product receives input or data, but it does not validate or incorrectly validates that the input has the properties that are required to process the data safely and correctly.
- [Architecture and Design] Consider using language-theoretic security (LangSec) techniques that characterize inputs using a formal language and build "recognizers" for that language. This effectively requires parsing to be a distinct layer that effectively enforces a boundary between raw input and internal data representations, instead of allowing parser code to be scattered throughout the program, where it could be subject to errors or inconsistencies that create weaknesses. [REF-1109] [REF-1110] [REF-1111]
- [Architecture and Design] Use an input validation framework such as Struts or the OWASP ESAPI Validation API. Note that using a framework does not automatically address all input validation problems; be mindful of weaknesses that could arise from misusing the framework itself (CWE-1173).
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H References
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm