CVE-2025-59425: vLLM: timing attack enables API key bypass

GHSA-wr9h-g72x-mwhm HIGH PoC AVAILABLE
Published October 7, 2025
CISO Take

vLLM instances using built-in API key validation (--api-key flag) are vulnerable to timing-based authentication bypass via statistical analysis of response times. Upgrade to vLLM 0.11.0 immediately. If upgrade is blocked, offload authentication to a reverse proxy (nginx, Kong, cloud API gateway) and disable vLLM's native key validation entirely.

What is the risk?

Moderate-to-high risk for production deployments. Exploitation is fully automatable and requires no prior credentials — only network access and patience for hundreds to thousands of HTTP requests. The EPSS score is low today (0.37%) but timing attack tooling is widely available and the technique is well-documented. Internet-exposed vLLM endpoints or multi-tenant inference clusters face the highest exposure. Internal deployments with network segmentation are lower risk but not immune.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
vLLM pip No patch
83.4K 130 dependents Pushed 3d ago 34% patched ~32d to patch Full package profile →
vLLM pip < 0.11.0 0.11.0
83.4K 130 dependents Pushed 3d ago 34% patched ~32d to patch Full package profile →

How severe is it?

CVSS 3.1
7.5 / 10
EPSS
0.5%
chance of exploitation in 30 days
Higher than 41% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Moderate
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C High
I None
A None

What should I do?

5 steps
  1. PATCH

    Upgrade to vLLM >= 0.11.0 which uses constant-time comparison (hmac.compare_digest).

  2. WORKAROUND

    If immediate upgrade is blocked, terminate API authentication at the reverse proxy layer (nginx auth_request, Kong key-auth, AWS API Gateway authorizer) and remove the --api-key flag from vLLM.

  3. ROTATE

    After patching, rotate all API keys that were in use on affected instances.

  4. DETECT

    Review access logs for anomalous authentication patterns — thousands of 401 responses with incremental key variations from a single IP.

  5. RATE-LIMIT: Implement aggressive rate limiting on the authentication endpoint to significantly increase the time required for timing analysis, even pre-patch.

What does CISA's SSVC say?

Decision Track
Exploitation none
Automatable Yes
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.1.6 - AI system security
NIST AI RMF
MANAGE 2.2 - Mechanisms to respond to AI risks
OWASP LLM Top 10
LLM10:2025 - Unbounded Consumption

Frequently Asked Questions

What is CVE-2025-59425?

vLLM instances using built-in API key validation (--api-key flag) are vulnerable to timing-based authentication bypass via statistical analysis of response times. Upgrade to vLLM 0.11.0 immediately. If upgrade is blocked, offload authentication to a reverse proxy (nginx, Kong, cloud API gateway) and disable vLLM's native key validation entirely.

Is CVE-2025-59425 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-59425, increasing the risk of exploitation.

How to fix CVE-2025-59425?

1. PATCH: Upgrade to vLLM >= 0.11.0 which uses constant-time comparison (hmac.compare_digest). 2. WORKAROUND: If immediate upgrade is blocked, terminate API authentication at the reverse proxy layer (nginx auth_request, Kong key-auth, AWS API Gateway authorizer) and remove the --api-key flag from vLLM. 3. ROTATE: After patching, rotate all API keys that were in use on affected instances. 4. DETECT: Review access logs for anomalous authentication patterns — thousands of 401 responses with incremental key variations from a single IP. 5. RATE-LIMIT: Implement aggressive rate limiting on the authentication endpoint to significantly increase the time required for timing analysis, even pre-patch.

What systems are affected by CVE-2025-59425?

This vulnerability affects the following AI/ML architecture patterns: model serving, LLM inference APIs, AI API gateways, multi-tenant inference clusters.

What is the CVSS score for CVE-2025-59425?

CVE-2025-59425 has a CVSS v3.1 base score of 7.5 (HIGH). The EPSS exploitation probability is 0.54%.

What is the AI security impact?

Affected AI Architectures

model servingLLM inference APIsAI API gatewaysmulti-tenant inference clusters

MITRE ATLAS Techniques

AML.T0006 Active Scanning
AML.T0034 Cost Harvesting
AML.T0040 AI Model Inference API Access
AML.T0049 Exploit Public-Facing Application
AML.T0106 Exploitation for Credential Access

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: A.6.1.6
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM10:2025

What are the technical details?

Original Advisory

vLLM is an inference and serving engine for large language models (LLMs). Before version 0.11.0rc2, the API key support in vLLM performs validation using a method that was vulnerable to a timing attack. API key validation uses a string comparison that takes longer the more characters the provided API key gets correct. Data analysis across many attempts could allow an attacker to determine when it finds the next correct character in the key sequence. Deployments relying on vLLM's built-in API key validation are vulnerable to authentication bypass using this technique. Version 0.11.0rc2 fixes the issue.

Exploitation Scenario

An adversary discovers a vLLM instance via passive internet scanning (Shodan, Censys — vLLM's OpenAI-compatible /health endpoint is a clear fingerprint). They write a Python script that iterates over each character position of the API key, sending batches of requests with candidate characters and measuring median response time across each batch. Because Python's == operator compares strings character-by-character and exits early on mismatch, correct characters produce measurably longer comparison times. After O(N * charset_size * samples) requests — fully automatable in hours on a fast connection — the full API key is reconstructed. The attacker then authenticates legitimately and queries the model for sensitive system prompt extraction, runs cost-harvesting inference loops, or uses the access as initial foothold for lateral movement.

Weaknesses (CWE)

CWE-385 — Covert Timing Channel: Covert timing channels convey information by modulating some aspect of system behavior over time, so that the program receiving the information can observe system behavior and infer protected information.

  • [Architecture and Design] Whenever possible, specify implementation strategies that do not introduce time variances in operations.
  • [Implementation] Often one can artificially manipulate the time which operations take or -- when operations occur -- can remove information from the attacker.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N

Timeline

Published
October 7, 2025
Last Modified
October 21, 2025
First Seen
October 7, 2025

Related Vulnerabilities