CVE-2025-71379: vLLM ReDoS — MEDIUM

CISO Take

vLLM versions 0.6.3 through 0.8.x contain three ReDoS vulnerabilities in the LoRA utilities, phi4mini tool parser, and OpenAI-compatible chat serving endpoint, allowing any authenticated API user to trigger catastrophic regex backtracking with a single crafted request. The OpenAI-compatible endpoint exposure is the critical risk surface here — it is the primary interface in production inference deployments, meaning every downstream consumer (agentic workflows, RAG pipelines, chatbots) inherits this exposure, and the low attack complexity with minimal privilege requirement makes exploitation accessible to any API key holder. No active exploitation is currently confirmed (not in CISA KEV, EPSS unavailable), but in multi-tenant inference environments a single malicious request can cascade to a complete service outage for all users, making the operational blast radius materially worse than the CVSS 4.3 score suggests. Upgrade to vLLM 0.9.0 or later immediately; as a temporary control, enforce strict input length limits and per-key rate caps at the API gateway before requests reach vLLM.

Sources: NVD GitHub Advisory ATLAS

What is the risk?

The CVSS 4.3 medium rating understates operational risk for production inference environments. Three affected components — including the network-exposed OpenAI-compatible API — mean any authenticated user can trigger CPU exhaustion with a single crafted request. In high-throughput or multi-tenant vLLM deployments, this cascades to a complete service outage affecting all consumers sharing the inference instance. The absence of public exploits and KEV listing reduces immediate threat actor pressure, but the trivial sophistication required (no AI/ML knowledge needed, just a pathological string) expands the exploitable attacker population significantly. Organizations running vLLM in exposed SaaS inference platforms or internal AI infrastructure should treat this as high operational priority despite the medium CVSS score.

How does the attack unfold?

Initial Access

Attacker obtains low-privilege API credentials for the vLLM inference server, such as a trial API key, a leaked application credential, or a key issued to an untrusted user.

AML.T0040

Payload Delivery

Attacker submits a crafted POST request to /v1/chat/completions containing a message body with deeply nested or pathologically repeated string patterns targeting the vulnerable regex in the chat serving layer.

AML.T0049

Resource Exhaustion

The vulnerable regex in the OpenAI-compatible serving endpoint triggers catastrophic backtracking, pinning a CPU core at 100% utilization for an extended period with no model inference completing.

AML.T0034.001

Service Disruption

Legitimate inference requests time out or queue indefinitely, causing complete denial of service for all users of the vLLM instance and all downstream AI applications — agents, RAG pipelines, and chatbots — that depend on it.

AML.T0029

Initial Access

Attacker obtains low-privilege API credentials for the vLLM inference server, such as a trial API key, a leaked application credential, or a key issued to an untrusted user.

AML.T0040

Payload Delivery

Attacker submits a crafted POST request to /v1/chat/completions containing a message body with deeply nested or pathologically repeated string patterns targeting the vulnerable regex in the chat serving layer.

AML.T0049

Resource Exhaustion

The vulnerable regex in the OpenAI-compatible serving endpoint triggers catastrophic backtracking, pinning a CPU core at 100% utilization for an extended period with no model inference completing.

AML.T0034.001

Service Disruption

Legitimate inference requests time out or queue indefinitely, causing complete denial of service for all users of the vLLM instance and all downstream AI applications — agents, RAG pipelines, and chatbots — that depend on it.

AML.T0029

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
vLLM	pip	—	No patch
82.8K 130 dependents Pushed 6d ago 34% patched ~30d to patch Full package profile →

Do you use vLLM? You're affected.

How severe is it?

CVSS 3.1

4.3 / 10

EPSS

N/A

Exploitation Status

No known exploitation

Sophistication

Trivial

What is the attack surface?

AV Network

AC Low

PR Low

UI None

S Unchanged

C None

I None

A Low

What should I do?

6 steps

Upgrade vLLM to >= 0.9.0 (patch release addressing all three vulnerable regex components).
If immediate upgrade is not feasible: enforce maximum input character/token limits at the API gateway or load balancer layer before requests reach vLLM.
Implement per-API-key rate limiting to reduce the impact window of sustained ReDoS attempts.
Monitor inference server per-thread CPU utilization — sustained 100% on a single core correlated with specific API requests is a candidate indicator of exploitation.
For LoRA-based deployments, validate and sanitize adapter-related request parameters at ingress.
Audit which external parties hold API keys and revoke access for untrusted or unnecessary principals until the patch is applied.

How is it classified?

DoS Inference API AML.T0029 - Denial of AI Service AML.T0034.001 - Resource-Intensive Queries AML.T0040 - AI Model Inference API Access AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.6.2.6 - AI system availability and resilience

NIST AI RMF

MANAGE 2.4 - Risks are prioritized and responded to based on impact

OWASP LLM Top 10

LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2025-71379?

vLLM versions 0.6.3 through 0.8.x contain three ReDoS vulnerabilities in the LoRA utilities, phi4mini tool parser, and OpenAI-compatible chat serving endpoint, allowing any authenticated API user to trigger catastrophic regex backtracking with a single crafted request. The OpenAI-compatible endpoint exposure is the critical risk surface here — it is the primary interface in production inference deployments, meaning every downstream consumer (agentic workflows, RAG pipelines, chatbots) inherits this exposure, and the low attack complexity with minimal privilege requirement makes exploitation accessible to any API key holder. No active exploitation is currently confirmed (not in CISA KEV, EPSS unavailable), but in multi-tenant inference environments a single malicious request can cascade to a complete service outage for all users, making the operational blast radius materially worse than the CVSS 4.3 score suggests. Upgrade to vLLM 0.9.0 or later immediately; as a temporary control, enforce strict input length limits and per-key rate caps at the API gateway before requests reach vLLM.

Is CVE-2025-71379 actively exploited?

No confirmed active exploitation of CVE-2025-71379 has been reported, but organizations should still patch proactively.

How to fix CVE-2025-71379?

1. Upgrade vLLM to >= 0.9.0 (patch release addressing all three vulnerable regex components). 2. If immediate upgrade is not feasible: enforce maximum input character/token limits at the API gateway or load balancer layer before requests reach vLLM. 3. Implement per-API-key rate limiting to reduce the impact window of sustained ReDoS attempts. 4. Monitor inference server per-thread CPU utilization — sustained 100% on a single core correlated with specific API requests is a candidate indicator of exploitation. 5. For LoRA-based deployments, validate and sanitize adapter-related request parameters at ingress. 6. Audit which external parties hold API keys and revoke access for untrusted or unnecessary principals until the patch is applied.

What systems are affected by CVE-2025-71379?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, OpenAI-compatible API endpoints, Fine-tuned model serving (LoRA), Agent frameworks, RAG pipelines.

What is the CVSS score for CVE-2025-71379?

CVE-2025-71379 has a CVSS v3.1 base score of 4.3 (MEDIUM).

What is the AI security impact?

Affected AI Architectures

LLM inference servingOpenAI-compatible API endpointsFine-tuned model serving (LoRA)Agent frameworksRAG pipelines

MITRE ATLAS Techniques

AML.T0029 Denial of AI Service

AML.T0034.001 Resource-Intensive Queries

AML.T0040 AI Model Inference API Access

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15

ISO 42001: A.6.2.6

NIST AI RMF: MANAGE 2.4

OWASP LLM Top 10: LLM04

What are the technical details?

Original Advisory

vLLM versions >= 0.6.3 and < 0.9.0 contain multiple regular expression denial of service (ReDoS) vulnerabilities. Several regex patterns — in vllm/lora/utils.py, the phi4mini tool parser, and the OpenAI-compatible serving chat endpoint — are susceptible to catastrophic backtracking. An attacker submitting crafted input with nested or repeated structures can trigger severe CPU consumption and performance degradation, resulting in denial of service.

Exploitation Scenario

An attacker with a low-privilege API key — a free trial user, a compromised downstream application credential, or an insider — sends a single POST to the vLLM /v1/chat/completions endpoint with a message body containing deeply nested or pathologically repeated string structures designed to trigger catastrophic backtracking in the vulnerable regex. The chat serving layer's regex consumes 100% of a CPU core for an extended duration, causing inference request queue depth to spike. Legitimate requests time out. In a shared multi-tenant inference deployment, all users are denied service from a single unauthenticated-equivalent request. The attacker repeats at low frequency to sustain the outage with minimal effort. No ML knowledge, no model access, and no specialized tooling are required beyond the endpoint URL and a crafted payload string.

Weaknesses (CWE)

CWE-1333 Inefficient Regular Expression Complexity Primary

CWE-1333 — Inefficient Regular Expression Complexity: The product uses a regular expression with a worst-case computational complexity that is inefficient and possibly exponential.

[Architecture and Design] Use regular expressions that do not support backtracking, e.g. by removing nested quantifiers.
[System Configuration] Set backtracking limits in the configuration of the regular expression implementation, such as PHP's pcre.backtrack_limit. Also consider limits on execution time for the process.

Source: MITRE CWE corpus.