GHSA-j828-28rj-hfhp — MEDIUM (CVSS 4.3) AI Security Vulnerability

CISO Take

If your organization runs vllm for LLM inference (including OpenAI-compatible APIs), an authenticated low-privilege user can trigger CPU exhaustion via crafted inputs to LoRA utilities, tool parsers, or chat endpoints. Upgrade to vllm >= 0.9.0 immediately; if patching is delayed, enforce strict input length limits at the API gateway layer. This is a genuine availability risk for production LLM serving infrastructure.

Risk Assessment

Medium severity (CVSS 4.3, AV:N/AC:L/PR:L/UI:N) but operationally significant for teams running vllm in production. The attack requires only low privileges — standard API access is sufficient — and no user interaction. Exploitation is trivial: crafted strings with nested patterns trigger catastrophic regex backtracking, pinning CPU threads. In multi-tenant or public-facing LLM serving deployments the blast radius extends to all concurrent inference requests. Not in CISA KEV and no evidence of active exploitation, but the PoC pattern is well-understood and weaponizable by any threat actor familiar with ReDoS.

Affected Systems

Package	Ecosystem	Vulnerable Range	Patched
vllm	pip	>= 0.6.3, < 0.9.0	`0.9.0`
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →

Do you use vllm? You're affected.

Severity & Risk

CVSS 3.1

4.3 / 10

EPSS

N/A

Exploitation Status

No known exploitation

Sophistication

Trivial

Attack Surface

AV Network

AC Low

PR Low

UI None

S Unchanged

C None

I None

A Low

Recommended Action

5 steps

PATCH

Upgrade vllm to >= 0.9.0 (fix in PR #18454, commit 4fc1bf8). This is the only complete remediation.
WORKAROUND (if patching is delayed): Enforce maximum input length limits at the API gateway or load balancer (e.g., 4096-8192 chars per request field); reject oversized payloads before they reach vllm.
RATE LIMIT

Apply per-user/per-IP rate limiting on inference endpoints to slow down brute-force ReDoS attempts.
MONITOR

Alert on sustained high CPU usage per inference worker process — ReDoS will manifest as CPU spikes without proportional GPU utilization.
DETECT

Log and flag requests with unusual nesting depth (e.g., deeply nested parentheses, brackets) in tool-call or LoRA parameter fields.

Classification

DoS Inference Framework AML.T0029 - Denial of AI Service AML.T0034 - Cost Harvesting AML.T0049 - Exploit Public-Facing Application

Compliance Impact

This CVE is relevant to:

EU AI Act

Art.15 - Accuracy, Robustness and Cybersecurity

ISO 42001

A.6.2.6 - AI System Robustness and Reliability

NIST AI RMF

MG-2.2 - AI Risk Management — Incident Response MS-2.5 - AI Risk Measurement — Robustness Testing

OWASP LLM Top 10

LLM10:2025 - Unbounded Consumption

Frequently Asked Questions

What is GHSA-j828-28rj-hfhp?

If your organization runs vllm for LLM inference (including OpenAI-compatible APIs), an authenticated low-privilege user can trigger CPU exhaustion via crafted inputs to LoRA utilities, tool parsers, or chat endpoints. Upgrade to vllm >= 0.9.0 immediately; if patching is delayed, enforce strict input length limits at the API gateway layer. This is a genuine availability risk for production LLM serving infrastructure.

Is GHSA-j828-28rj-hfhp actively exploited?

No confirmed active exploitation of GHSA-j828-28rj-hfhp has been reported, but organizations should still patch proactively.

How to fix GHSA-j828-28rj-hfhp?

1. PATCH: Upgrade vllm to >= 0.9.0 (fix in PR #18454, commit 4fc1bf8). This is the only complete remediation. 2. WORKAROUND (if patching is delayed): Enforce maximum input length limits at the API gateway or load balancer (e.g., 4096-8192 chars per request field); reject oversized payloads before they reach vllm. 3. RATE LIMIT: Apply per-user/per-IP rate limiting on inference endpoints to slow down brute-force ReDoS attempts. 4. MONITOR: Alert on sustained high CPU usage per inference worker process — ReDoS will manifest as CPU spikes without proportional GPU utilization. 5. DETECT: Log and flag requests with unusual nesting depth (e.g., deeply nested parentheses, brackets) in tool-call or LoRA parameter fields.

What systems are affected by GHSA-j828-28rj-hfhp?

This vulnerability affects the following AI/ML architecture patterns: model serving, LLM inference, LoRA fine-tuning serving, tool calling / function calling pipelines, agent frameworks using vllm as backend.

What is the CVSS score for GHSA-j828-28rj-hfhp?

GHSA-j828-28rj-hfhp has a CVSS v3.1 base score of 4.3 (MEDIUM).

Technical Details

NVD Description

### Summary A recent review identified several regular expressions in the vllm codebase that are susceptible to Regular Expression Denial of Service (ReDoS) attacks. These patterns, if fed with crafted or malicious input, may cause severe performance degradation due to catastrophic backtracking. #### 1. vllm/lora/utils.py [Line 173](https://github.com/vllm-project/vllm/blob/2858830c39da0ae153bc1328dbba7680f5fbebe1/vllm/lora/utils.py#L173) https://github.com/vllm-project/vllm/blob/2858830c39da0ae153bc1328dbba7680f5fbebe1/vllm/lora/utils.py#L173 **Risk Description:** - The regex `r"$(.*?)$\$?$"` matches content inside parentheses. If input such as `((((a|)+)+)+)` is passed in, it can cause catastrophic backtracking, leading to a ReDoS vulnerability. - Using `.*?` (non-greedy match) inside group parentheses can be highly sensitive to input length and nesting complexity. **Remediation Suggestions:** - Limit the input string length. - Use a non-recursive matching approach, or write a regex with stricter content constraints. - Consider using possessive quantifiers or atomic groups (not supported in Python yet), or split and process before regex matching. --- #### 2. vllm/entrypoints/openai/tool_parsers/phi4mini_tool_parser.py [Line 52](https://github.com/vllm-project/vllm/blob/2858830c39da0ae153bc1328dbba7680f5fbebe1/vllm/entrypoints/openai/tool_parsers/phi4mini_tool_parser.py#L52) https://github.com/vllm-project/vllm/blob/2858830c39da0ae153bc1328dbba7680f5fbebe1/vllm/entrypoints/openai/tool_parsers/phi4mini_tool_parser.py#L52 **Risk Description:** - The regex `r'functools\[(.*?)\]'` uses `.*?` to match content inside brackets, together with `re.DOTALL`. If the input contains a large number of nested or crafted brackets, it can cause backtracking and ReDoS. **Remediation Suggestions:** - Limit the length of `model_output`. - Use a stricter, non-greedy pattern (avoid matching across extraneous nesting). - Prefer `re.finditer()` and enforce a length constraint on each match. --- #### 3. vllm/entrypoints/openai/serving_chat.py [Line 351](https://github.com/vllm-project/vllm/blob/2858830c39da0ae153bc1328dbba7680f5fbebe1/vllm/entrypoints/openai/serving_chat.py#L351) https://github.com/vllm-project/vllm/blob/2858830c39da0ae153bc1328dbba7680f5fbebe1/vllm/entrypoints/openai/serving_chat.py#L351 **Risk Description:** - The regex `r'.*"parameters":\s*(.*)'` can trigger backtracking if `current_text` is very long and contains repeated structures. - Especially when processing strings from unknown sources, `.*` matching any content is high risk. **Remediation Suggestions:** - Use a more specific pattern (e.g., via JSON parsing). - Impose limits on `current_text` length. - Avoid using `.*` to capture large blocks of text; prefer structured parsing when possible. --- #### 4. benchmarks/benchmark_serving_structured_output.py [Line 650](https://github.com/vllm-project/vllm/blob/2858830c39da0ae153bc1328dbba7680f5fbebe1/benchmarks/benchmark_serving_structured_output.py#L650) https://github.com/vllm-project/vllm/blob/2858830c39da0ae153bc1328dbba7680f5fbebe1/benchmarks/benchmark_serving_structured_output.py#L650 **Risk Description:** - The regex `r'\{.*\}'` is used to extract JSON inside curly braces. If the `actual` string is very long with unbalanced braces, it can cause backtracking, leading to a ReDoS vulnerability. - Although this is used for benchmark correctness checking, it should still handle abnormal inputs carefully. **Remediation Suggestions:** - Limit the length of `actual`. - Prefer stepwise search for `{` and `}` or use a robust JSON extraction tool. - Recommend first locating the range with simple string search, then applying regex. ### Fix * https://github.com/vllm-project/vllm/pull/18454 ---

Exploitation Scenario

An adversary with a standard API key (low privilege — e.g., a trial user or compromised credential) submits a POST to /v1/chat/completions on a vllm-served endpoint. The request includes a crafted 'tool_calls' field with deeply nested bracket structures like 'functools[((((a|)+)+)+)]' targeting the phi4mini_tool_parser regex, or a malformed LoRA adapter name matching the vulnerable pattern in lora/utils.py. The regex engine enters catastrophic backtracking, consuming 100% of one or more CPU threads for seconds to minutes per request. By sending a continuous stream of such requests — easily achievable with a low-rate flood from a single client — the adversary degrades inference throughput for all legitimate users, effectively taking the LLM serving endpoint offline without triggering volumetric DoS defenses.