CVE-2025-59425 — HIGH (CVSS 7.5) AI Security Vulnerability

CISO Take

vLLM instances using built-in API key validation (--api-key flag) are vulnerable to timing-based authentication bypass via statistical analysis of response times. Upgrade to vLLM 0.11.0 immediately. If upgrade is blocked, offload authentication to a reverse proxy (nginx, Kong, cloud API gateway) and disable vLLM's native key validation entirely.

Risk Assessment

Moderate-to-high risk for production deployments. Exploitation is fully automatable and requires no prior credentials — only network access and patience for hundreds to thousands of HTTP requests. The EPSS score is low today (0.37%) but timing attack tooling is widely available and the technique is well-documented. Internet-exposed vLLM endpoints or multi-tenant inference clusters face the highest exposure. Internal deployments with network segmentation are lower risk but not immune.

Affected Systems

Package	Ecosystem	Vulnerable Range	Patched
vllm	pip	—	No patch
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →
vllm	pip	< 0.11.0	`0.11.0`
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →

Severity & Risk

CVSS 3.1

7.5 / 10

EPSS

0.4%

chance of exploitation in 30 days

Higher than 59% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Moderate

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV Network

AC Low

PR None

UI None

S Unchanged

C High

I None

A None

Recommended Action

5 steps

PATCH

Upgrade to vLLM >= 0.11.0 which uses constant-time comparison (hmac.compare_digest).
WORKAROUND

If immediate upgrade is blocked, terminate API authentication at the reverse proxy layer (nginx auth_request, Kong key-auth, AWS API Gateway authorizer) and remove the --api-key flag from vLLM.
ROTATE

After patching, rotate all API keys that were in use on affected instances.
DETECT

Review access logs for anomalous authentication patterns — thousands of 401 responses with incremental key variations from a single IP.
RATE-LIMIT: Implement aggressive rate limiting on the authentication endpoint to significantly increase the time required for timing analysis, even pre-patch.

CISA SSVC Assessment

Decision Track

Exploitation none

Automatable Yes

Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Auth Bypass Inference API AML.T0006 - Active Scanning AML.T0034 - Cost Harvesting AML.T0040 - AI Model Inference API Access AML.T0049 - Exploit Public-Facing Application AML.T0106 - Exploitation for Credential Access

Compliance Impact

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.6.1.6 - AI system security

NIST AI RMF

MANAGE 2.2 - Mechanisms to respond to AI risks

OWASP LLM Top 10

LLM10:2025 - Unbounded Consumption

Frequently Asked Questions

What is CVE-2025-59425?

vLLM instances using built-in API key validation (--api-key flag) are vulnerable to timing-based authentication bypass via statistical analysis of response times. Upgrade to vLLM 0.11.0 immediately. If upgrade is blocked, offload authentication to a reverse proxy (nginx, Kong, cloud API gateway) and disable vLLM's native key validation entirely.

Is CVE-2025-59425 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-59425, increasing the risk of exploitation.

How to fix CVE-2025-59425?

1. PATCH: Upgrade to vLLM >= 0.11.0 which uses constant-time comparison (hmac.compare_digest). 2. WORKAROUND: If immediate upgrade is blocked, terminate API authentication at the reverse proxy layer (nginx auth_request, Kong key-auth, AWS API Gateway authorizer) and remove the --api-key flag from vLLM. 3. ROTATE: After patching, rotate all API keys that were in use on affected instances. 4. DETECT: Review access logs for anomalous authentication patterns — thousands of 401 responses with incremental key variations from a single IP. 5. RATE-LIMIT: Implement aggressive rate limiting on the authentication endpoint to significantly increase the time required for timing analysis, even pre-patch.

What systems are affected by CVE-2025-59425?

This vulnerability affects the following AI/ML architecture patterns: model serving, LLM inference APIs, AI API gateways, multi-tenant inference clusters.

What is the CVSS score for CVE-2025-59425?

CVE-2025-59425 has a CVSS v3.1 base score of 7.5 (HIGH). The EPSS exploitation probability is 0.37%.

Technical Details

NVD Description

vLLM is an inference and serving engine for large language models (LLMs). Before version 0.11.0rc2, the API key support in vLLM performs validation using a method that was vulnerable to a timing attack. API key validation uses a string comparison that takes longer the more characters the provided API key gets correct. Data analysis across many attempts could allow an attacker to determine when it finds the next correct character in the key sequence. Deployments relying on vLLM's built-in API key validation are vulnerable to authentication bypass using this technique. Version 0.11.0rc2 fixes the issue.

Exploitation Scenario

An adversary discovers a vLLM instance via passive internet scanning (Shodan, Censys — vLLM's OpenAI-compatible /health endpoint is a clear fingerprint). They write a Python script that iterates over each character position of the API key, sending batches of requests with candidate characters and measuring median response time across each batch. Because Python's == operator compares strings character-by-character and exits early on mismatch, correct characters produce measurably longer comparison times. After O(N * charset_size * samples) requests — fully automatable in hours on a fast connection — the full API key is reconstructed. The attacker then authenticates legitimately and queries the model for sensitive system prompt extraction, runs cost-harvesting inference loops, or uses the access as initial foothold for lateral movement.