CVE-2025-62426

GHSA-69j4-grxj-j64p MEDIUM
Published November 21, 2025
CISO Take

If your organization runs vLLM versions 0.5.5 through 0.11.0 to serve LLM inference APIs, any authenticated user with minimal access can deliberately stall your entire API server by sending crafted chat_template_kwargs parameters, denying service to all other users. Patch to vLLM 0.11.1 immediately; if patching is not possible today, add request-level timeouts and block or sanitize the chat_template_kwargs parameter at your API gateway before it reaches vLLM. This is a low-complexity, network-reachable DoS with no user interaction required.

Affected Systems

Package Ecosystem Vulnerable Range Patched
vllm pip >= 0.5.5, < 0.11.1 0.11.1
vllm pip No patch
vllm pip No patch
vllm pip No patch

Severity & Risk

CVSS 3.1
6.5 / 10
EPSS
0.1%
chance of exploitation in 30 days
KEV Status
Not in KEV
Sophistication
Trivial

Recommended Action

  1. 1. PATCH: Upgrade vLLM to 0.11.1 or later—this is the only complete fix. 2. WORKAROUND (if patching is not immediate): Inject a middleware or API gateway rule to strip or reject the chat_template_kwargs parameter from all incoming requests to /v1/chat/completions and /tokenize. 3. RATE LIMIT: Enforce per-client request rate limits and maximum concurrent request counts at the proxy or load balancer layer (nginx, Traefik, or Envoy). 4. TIMEOUT: Configure server-side request processing timeouts to cap how long any single request can hold the processing thread. 5. DETECT: Alert on requests to vLLM endpoints containing chat_template_kwargs in your WAF or API gateway logs; investigate any request with abnormally high server-side latency (>30s). 6. AUDIT: Review API access logs to identify any accounts that may have already attempted exploitation.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Art. 9 - Risk Management System Article 9 - Risk management system
ISO 42001
A.9.2 - AI system operation and monitoring A.9.3 - AI System Availability and Resilience
NIST AI RMF
MANAGE 2.2 - Incident Response and Recovery for AI Risks MANAGE-2.2 - Risks or system failures are responded to in accordance with previously identified roles and responsibilities MAP 5.1 - Likelihood and Impact of AI Risks Categorized MAP-5.1 - Likelihood and magnitude of each identified impact based on impacts to organizations, individuals, and society
OWASP LLM Top 10
LLM04 - Model Denial of Service LLM10:2025 - Unbounded Consumption

Technical Details

NVD Description

vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, the /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the chat template. With the right chat_template_kwargs parameters, it is possible to block processing of the API server for long periods of time, delaying all other requests. This issue has been patched in version 0.11.1.

Exploitation Scenario

An adversary obtains low-privilege API credentials—via a free trial account, a compromised developer key, or a misconfigured multi-tenant setup. They craft a POST request to /v1/chat/completions with a malicious chat_template_kwargs payload designed to trigger unbounded resource consumption in vLLM's template processing path (chat_utils.py L1602-1610). Because the parameter is consumed before validation, the server thread blocks indefinitely processing the malicious request. The adversary repeats this with a handful of concurrent requests to monopolize all available server threads. All legitimate inference requests queue indefinitely, causing a full service outage. The adversary may use this to cause SLA violations, disrupt a competing AI product, or as a smokescreen for other API abuse occurring simultaneously.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
November 21, 2025
Last Modified
December 4, 2025
First Seen
November 21, 2025