CVE-2026-48746: vllm: auth bypass exposes OpenAI inference API

GHSA-94f4-hr76-p5j6 CRITICAL
Published June 16, 2026
CISO Take

vllm's OpenAI-compatible API middleware contains a critical authentication bypass (CVSS 9.1) that allows any unauthenticated attacker to reach the LLM inference API by embedding path-separator characters in the HTTP Host header — starlette reconstructs a URL whose parsed .path attribute skips the API key check, while FastAPI still routes the actual request normally. With 130 downstream dependents and an EPSS placing this in the top 77th percentile for exploitation probability, any organization directly exposing a vllm endpoint without an RFC-conforming reverse proxy is fully compromised with a single crafted HTTP request: no credentials, no privileges, no user interaction required. The fix is immediate: upgrade to vllm 0.22.0; organizations that cannot patch right now can place nginx or an equivalent standards-compliant proxy upstream, as Host header normalization neutralizes the attack entirely.

Sources: NVD EPSS GitHub Advisory ATLAS

What is the risk?

Critical. CVSS 9.1 with a fully network-accessible, zero-privilege, zero-interaction attack vector makes this a top-priority patch for any team running vllm. The vulnerability is deterministic and trivially exploitable — a single malformed HTTP request is sufficient, and no AI or ML knowledge is required. The only natural mitigating factor is that deployments behind RFC-conforming reverse proxies (nginx, Caddy, HAProxy) are not affected, as those normalize the Host header before it reaches vllm. Direct-to-vllm exposure — common in internal ML serving clusters, Kubernetes pod services, and developer sandboxes — eliminates this protection. The 77th EPSS percentile and public advisory from x41-dsec.de signal that exploit development and active scanning are likely to follow quickly.

How does the attack unfold?

Discovery
Attacker scans for vllm instances exposed directly on port 8000 without a normalizing reverse proxy, using banner grabbing, Shodan queries for vllm version strings, or internal network enumeration.
AML.T0006
Authentication Bypass
Attacker crafts an HTTP request with a Host header containing special URL characters (e.g., 'attacker.com?x='), causing starlette's URL reconstruction to produce a .path value that does not start with '/v1', causing AuthenticationMiddleware to skip the API key check.
AML.T0049
Inference API Access
FastAPI routes the request based on the actual HTTP path and serves the /v1/chat/completions or any other API endpoint normally, granting the attacker full unauthenticated model access.
AML.T0040
Impact
Attacker exploits unrestricted model access for cost harvesting via resource-intensive queries, extraction of system prompts or session data, or GPU compute exhaustion causing denial of service for legitimate users.
AML.T0034

What systems are affected?

Package Ecosystem Vulnerable Range Patched
vLLM pip >= 0.3.0, < 0.22.0 0.22.0
82.8K 130 dependents Pushed 2d ago 43% patched ~30d to patch Full package profile →

Do you use vLLM? You're affected.

How severe is it?

CVSS 3.1
9.1 / 10
EPSS
0.1%
chance of exploitation in 30 days
Higher than 23% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Trivial

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C High
I None
A High

What should I do?

5 steps
  1. PATCH

    Upgrade vllm to >= 0.22.0 immediately — this is the only complete fix.

  2. WORKAROUND (if patching is not immediate): Place an RFC-conforming reverse proxy (nginx, Caddy, HAProxy) in front of vllm; Host header normalization at the proxy layer blocks the exploit.

  3. NETWORK CONTROLS

    Firewall vllm API ports to known client IPs; do not expose port 8000 directly to the internet or untrusted network segments.

  4. DETECTION

    Audit access logs for Host headers containing '/', '?', '@', or other special URL characters — these are reliable indicators of exploit attempts. Alert on inference requests where the Host header does not match the configured domain.

  5. VERIFY EXPOSURE

    Run 'curl -H "Host: attacker.com?x=" http://<vllm-host>/v1/models' without an API key — if model data is returned, the instance is vulnerable.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2.6 - AI system security
NIST AI RMF
MANAGE-2.2 - Mechanisms exist for tracking identified AI risks
OWASP LLM Top 10
LLM10:2023 - Model Theft

Frequently Asked Questions

What is CVE-2026-48746?

vllm's OpenAI-compatible API middleware contains a critical authentication bypass (CVSS 9.1) that allows any unauthenticated attacker to reach the LLM inference API by embedding path-separator characters in the HTTP Host header — starlette reconstructs a URL whose parsed .path attribute skips the API key check, while FastAPI still routes the actual request normally. With 130 downstream dependents and an EPSS placing this in the top 77th percentile for exploitation probability, any organization directly exposing a vllm endpoint without an RFC-conforming reverse proxy is fully compromised with a single crafted HTTP request: no credentials, no privileges, no user interaction required. The fix is immediate: upgrade to vllm 0.22.0; organizations that cannot patch right now can place nginx or an equivalent standards-compliant proxy upstream, as Host header normalization neutralizes the attack entirely.

Is CVE-2026-48746 actively exploited?

No confirmed active exploitation of CVE-2026-48746 has been reported, but organizations should still patch proactively.

How to fix CVE-2026-48746?

1. PATCH: Upgrade vllm to >= 0.22.0 immediately — this is the only complete fix. 2. WORKAROUND (if patching is not immediate): Place an RFC-conforming reverse proxy (nginx, Caddy, HAProxy) in front of vllm; Host header normalization at the proxy layer blocks the exploit. 3. NETWORK CONTROLS: Firewall vllm API ports to known client IPs; do not expose port 8000 directly to the internet or untrusted network segments. 4. DETECTION: Audit access logs for Host headers containing '/', '?', '@', or other special URL characters — these are reliable indicators of exploit attempts. Alert on inference requests where the Host header does not match the configured domain. 5. VERIFY EXPOSURE: Run 'curl -H "Host: attacker.com?x=" http://<vllm-host>/v1/models' without an API key — if model data is returned, the instance is vulnerable.

What systems are affected by CVE-2026-48746?

This vulnerability affects the following AI/ML architecture patterns: model serving, LLM inference API, AI agent backends, MLOps inference pipelines, RAG generation layer.

What is the CVSS score for CVE-2026-48746?

CVE-2026-48746 has a CVSS v3.1 base score of 9.1 (CRITICAL). The EPSS exploitation probability is 0.07%.

What is the AI security impact?

Affected AI Architectures

model servingLLM inference APIAI agent backendsMLOps inference pipelinesRAG generation layer

MITRE ATLAS Techniques

AML.T0006 Active Scanning
AML.T0034 Cost Harvesting
AML.T0034.000 Excessive Queries
AML.T0040 AI Model Inference API Access
AML.T0049 Exploit Public-Facing Application
AML.T0107 Exploitation for Defense Evasion

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: A.6.2.6
NIST AI RMF: MANAGE-2.2
OWASP LLM Top 10: LLM10:2023

What are the technical details?

Original Advisory

### Summary A vulnerability in ASGI web servers and starlette's trust on those web servers enables an authentication bypass of the OpenAI API `AuthenticationMiddleware`, which was discovered during @x41sec's source code audit. It allows to use the API without providing the configured `VLLM_API_KEY` or `--api-key`. ### Details In https://github.com/vllm-project/vllm/blob/v0.14.0/vllm/entrypoints/openai/api_server.py#L689-L692 the `url_path` is taken from the `URL`, which is reconstructed by _starlette_ based on the request `scope`. ```py from starlette.datastructures import URL, Headers, MutableHeaders, State # ... url_path = URL(scope=scope).path.removeprefix(root_path) headers = Headers(scope=scope) if url_path.startswith("/v1") and not self.verify_token(headers): response = JSONResponse(content={"error": "Unauthorized"}, status_code=401) return response(scope, receive, send) return self.app(scope, receive, send) ``` The request `scope` includes the request's `Host:` header and reconstructs the URL as shown below: ```py f"{scheme}://{host_header}{path}" ``` Neither starlette nor [any of the ASGI servers](https://asgi.readthedocs.io/en/latest/implementations.html#servers) (including uvicorn, which vllm uses) properly filter the `Host:` header for invalid characters. This allows an attacker to include special URL characters such as `/` or `?` in the `Host:` header and thereby control the reconstructed URL and it's `.path` attribute. FastAPI/starlette's routing uses the HTTP path and does not depend on the parsed url.path attribute, allowing attackers to reach an endpoint via a certain path while providing a different value in the `.path`. ### Impact - Instances of vllm that use an API Key for the OpenAI API and expose the API to attackers. - Instances behind an RFC-conforming web server (such as nginx) are **not** affected.

Exploitation Scenario

An attacker identifies a vllm instance (version < 0.22.0) serving an OpenAI-compatible API directly on port 8000, without nginx in the path — discoverable via Shodan, internal network scans, or reconnaissance of internal ML infrastructure. The attacker crafts an HTTP POST to /v1/chat/completions with the Host header set to 'attacker.com?x=' — starlette reconstructs the URL as 'http://attacker.com?x=/v1/chat/completions', and the parsed .path resolves to '/' since the query string absorbs the remainder. The AuthenticationMiddleware checks if this path starts with '/v1' — it does not, so the auth check is skipped and the request is forwarded to the FastAPI application. FastAPI routes based on the actual HTTP path '/v1/chat/completions' and serves the endpoint normally. The attacker now has unrestricted LLM access: they can query the hosted model without a valid key, extract system prompts, run resource-intensive generation requests to exhaust the victim's GPU budget, or repurpose the inference endpoint as free compute for their own tasks.

Weaknesses (CWE)

CWE-444 — Inconsistent Interpretation of HTTP Requests ('HTTP Request/Response Smuggling'): The product acts as an intermediary HTTP agent (such as a proxy or firewall) in the data flow between two entities such as a client and server, but it does not interpret malformed HTTP requests or responses in ways that are consistent with how the messages will be processed by those entities that are at the ultimate destination.

  • [Implementation] Use a web server that employs a strict HTTP parsing procedure, such as Apache [REF-433].
  • [Implementation] Use only SSL communication.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:H

Timeline

Published
June 16, 2026
Last Modified
June 16, 2026
First Seen
June 16, 2026

Related Vulnerabilities