CVE-2025-71379: vLLM: ReDoS via crafted API input causes DoS
MEDIUMvLLM versions 0.6.3 through 0.8.x contain three ReDoS vulnerabilities in the LoRA utilities, phi4mini tool parser, and OpenAI-compatible chat serving endpoint, allowing any authenticated API user to trigger catastrophic regex backtracking with a single crafted request. The OpenAI-compatible endpoint exposure is the critical risk surface here — it is the primary interface in production inference deployments, meaning every downstream consumer (agentic workflows, RAG pipelines, chatbots) inherits this exposure, and the low attack complexity with minimal privilege requirement makes exploitation accessible to any API key holder. No active exploitation is currently confirmed (not in CISA KEV, EPSS unavailable), but in multi-tenant inference environments a single malicious request can cascade to a complete service outage for all users, making the operational blast radius materially worse than the CVSS 4.3 score suggests. Upgrade to vLLM 0.9.0 or later immediately; as a temporary control, enforce strict input length limits and per-key rate caps at the API gateway before requests reach vLLM.
What is the risk?
The CVSS 4.3 medium rating understates operational risk for production inference environments. Three affected components — including the network-exposed OpenAI-compatible API — mean any authenticated user can trigger CPU exhaustion with a single crafted request. In high-throughput or multi-tenant vLLM deployments, this cascades to a complete service outage affecting all consumers sharing the inference instance. The absence of public exploits and KEV listing reduces immediate threat actor pressure, but the trivial sophistication required (no AI/ML knowledge needed, just a pathological string) expands the exploitable attacker population significantly. Organizations running vLLM in exposed SaaS inference platforms or internal AI infrastructure should treat this as high operational priority despite the medium CVSS score.
How does the attack unfold?
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| vLLM | pip | — | No patch |
Do you use vLLM? You're affected.
How severe is it?
What is the attack surface?
What should I do?
6 steps-
Upgrade vLLM to >= 0.9.0 (patch release addressing all three vulnerable regex components).
-
If immediate upgrade is not feasible: enforce maximum input character/token limits at the API gateway or load balancer layer before requests reach vLLM.
-
Implement per-API-key rate limiting to reduce the impact window of sustained ReDoS attempts.
-
Monitor inference server per-thread CPU utilization — sustained 100% on a single core correlated with specific API requests is a candidate indicator of exploitation.
-
For LoRA-based deployments, validate and sanitize adapter-related request parameters at ingress.
-
Audit which external parties hold API keys and revoke access for untrusted or unnecessary principals until the patch is applied.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2025-71379?
vLLM versions 0.6.3 through 0.8.x contain three ReDoS vulnerabilities in the LoRA utilities, phi4mini tool parser, and OpenAI-compatible chat serving endpoint, allowing any authenticated API user to trigger catastrophic regex backtracking with a single crafted request. The OpenAI-compatible endpoint exposure is the critical risk surface here — it is the primary interface in production inference deployments, meaning every downstream consumer (agentic workflows, RAG pipelines, chatbots) inherits this exposure, and the low attack complexity with minimal privilege requirement makes exploitation accessible to any API key holder. No active exploitation is currently confirmed (not in CISA KEV, EPSS unavailable), but in multi-tenant inference environments a single malicious request can cascade to a complete service outage for all users, making the operational blast radius materially worse than the CVSS 4.3 score suggests. Upgrade to vLLM 0.9.0 or later immediately; as a temporary control, enforce strict input length limits and per-key rate caps at the API gateway before requests reach vLLM.
Is CVE-2025-71379 actively exploited?
No confirmed active exploitation of CVE-2025-71379 has been reported, but organizations should still patch proactively.
How to fix CVE-2025-71379?
1. Upgrade vLLM to >= 0.9.0 (patch release addressing all three vulnerable regex components). 2. If immediate upgrade is not feasible: enforce maximum input character/token limits at the API gateway or load balancer layer before requests reach vLLM. 3. Implement per-API-key rate limiting to reduce the impact window of sustained ReDoS attempts. 4. Monitor inference server per-thread CPU utilization — sustained 100% on a single core correlated with specific API requests is a candidate indicator of exploitation. 5. For LoRA-based deployments, validate and sanitize adapter-related request parameters at ingress. 6. Audit which external parties hold API keys and revoke access for untrusted or unnecessary principals until the patch is applied.
What systems are affected by CVE-2025-71379?
This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, OpenAI-compatible API endpoints, Fine-tuned model serving (LoRA), Agent frameworks, RAG pipelines.
What is the CVSS score for CVE-2025-71379?
CVE-2025-71379 has a CVSS v3.1 base score of 4.3 (MEDIUM).
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0029 Denial of AI Service AML.T0034.001 Resource-Intensive Queries AML.T0040 AI Model Inference API Access AML.T0049 Exploit Public-Facing Application Compliance Controls Affected
What are the technical details?
Original Advisory
vLLM versions >= 0.6.3 and < 0.9.0 contain multiple regular expression denial of service (ReDoS) vulnerabilities. Several regex patterns — in vllm/lora/utils.py, the phi4mini tool parser, and the OpenAI-compatible serving chat endpoint — are susceptible to catastrophic backtracking. An attacker submitting crafted input with nested or repeated structures can trigger severe CPU consumption and performance degradation, resulting in denial of service.
Exploitation Scenario
An attacker with a low-privilege API key — a free trial user, a compromised downstream application credential, or an insider — sends a single POST to the vLLM /v1/chat/completions endpoint with a message body containing deeply nested or pathologically repeated string structures designed to trigger catastrophic backtracking in the vulnerable regex. The chat serving layer's regex consumes 100% of a CPU core for an extended duration, causing inference request queue depth to spike. Legitimate requests time out. In a shared multi-tenant inference deployment, all users are denied service from a single unauthenticated-equivalent request. The attacker repeats at low frequency to sustain the outage with minimal effort. No ML knowledge, no model access, and no specialized tooling are required beyond the endpoint URL and a crafted payload string.
Weaknesses (CWE)
CWE-1333 — Inefficient Regular Expression Complexity: The product uses a regular expression with a worst-case computational complexity that is inefficient and possibly exponential.
- [Architecture and Design] Use regular expressions that do not support backtracking, e.g. by removing nested quantifiers.
- [System Configuration] Set backtracking limits in the configuration of the regular expression implementation, such as PHP's pcre.backtrack_limit. Also consider limits on execution time for the process.
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:L References
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm