vllm's OpenAI-compatible API middleware contains a critical authentication bypass (CVSS 9.1) that allows any unauthenticated attacker to reach the LLM inference API by embedding path-separator characters in the HTTP Host header — starlette reconstructs a URL whose parsed .path attribute skips the API key check, while FastAPI still routes the actual request normally. With 130 downstream dependents and an EPSS placing this in the top 77th percentile for exploitation probability, any organization directly exposing a vllm endpoint without an RFC-conforming reverse proxy is fully compromised with a single crafted HTTP request: no credentials, no privileges, no user interaction required. The fix is immediate: upgrade to vllm 0.22.0; organizations that cannot patch right now can place nginx or an equivalent standards-compliant proxy upstream, as Host header normalization neutralizes the attack entirely.
What is the risk?
Critical. CVSS 9.1 with a fully network-accessible, zero-privilege, zero-interaction attack vector makes this a top-priority patch for any team running vllm. The vulnerability is deterministic and trivially exploitable — a single malformed HTTP request is sufficient, and no AI or ML knowledge is required. The only natural mitigating factor is that deployments behind RFC-conforming reverse proxies (nginx, Caddy, HAProxy) are not affected, as those normalize the Host header before it reaches vllm. Direct-to-vllm exposure — common in internal ML serving clusters, Kubernetes pod services, and developer sandboxes — eliminates this protection. The 77th EPSS percentile and public advisory from x41-dsec.de signal that exploit development and active scanning are likely to follow quickly.
How does the attack unfold?
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| vLLM | pip | >= 0.3.0, < 0.22.0 | 0.22.0 |
Do you use vLLM? You're affected.
How severe is it?
What is the attack surface?
What should I do?
5 steps-
PATCH
Upgrade vllm to >= 0.22.0 immediately — this is the only complete fix.
-
WORKAROUND (if patching is not immediate): Place an RFC-conforming reverse proxy (nginx, Caddy, HAProxy) in front of vllm; Host header normalization at the proxy layer blocks the exploit.
-
NETWORK CONTROLS
Firewall vllm API ports to known client IPs; do not expose port 8000 directly to the internet or untrusted network segments.
-
DETECTION
Audit access logs for Host headers containing '/', '?', '@', or other special URL characters — these are reliable indicators of exploit attempts. Alert on inference requests where the Host header does not match the configured domain.
-
VERIFY EXPOSURE
Run 'curl -H "Host: attacker.com?x=" http://<vllm-host>/v1/models' without an API key — if model data is returned, the instance is vulnerable.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2026-48746?
vllm's OpenAI-compatible API middleware contains a critical authentication bypass (CVSS 9.1) that allows any unauthenticated attacker to reach the LLM inference API by embedding path-separator characters in the HTTP Host header — starlette reconstructs a URL whose parsed .path attribute skips the API key check, while FastAPI still routes the actual request normally. With 130 downstream dependents and an EPSS placing this in the top 77th percentile for exploitation probability, any organization directly exposing a vllm endpoint without an RFC-conforming reverse proxy is fully compromised with a single crafted HTTP request: no credentials, no privileges, no user interaction required. The fix is immediate: upgrade to vllm 0.22.0; organizations that cannot patch right now can place nginx or an equivalent standards-compliant proxy upstream, as Host header normalization neutralizes the attack entirely.
Is CVE-2026-48746 actively exploited?
No confirmed active exploitation of CVE-2026-48746 has been reported, but organizations should still patch proactively.
How to fix CVE-2026-48746?
1. PATCH: Upgrade vllm to >= 0.22.0 immediately — this is the only complete fix. 2. WORKAROUND (if patching is not immediate): Place an RFC-conforming reverse proxy (nginx, Caddy, HAProxy) in front of vllm; Host header normalization at the proxy layer blocks the exploit. 3. NETWORK CONTROLS: Firewall vllm API ports to known client IPs; do not expose port 8000 directly to the internet or untrusted network segments. 4. DETECTION: Audit access logs for Host headers containing '/', '?', '@', or other special URL characters — these are reliable indicators of exploit attempts. Alert on inference requests where the Host header does not match the configured domain. 5. VERIFY EXPOSURE: Run 'curl -H "Host: attacker.com?x=" http://<vllm-host>/v1/models' without an API key — if model data is returned, the instance is vulnerable.
What systems are affected by CVE-2026-48746?
This vulnerability affects the following AI/ML architecture patterns: model serving, LLM inference API, AI agent backends, MLOps inference pipelines, RAG generation layer.
What is the CVSS score for CVE-2026-48746?
CVE-2026-48746 has a CVSS v3.1 base score of 9.1 (CRITICAL). The EPSS exploitation probability is 0.07%.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0006 Active Scanning AML.T0034 Cost Harvesting AML.T0034.000 Excessive Queries AML.T0040 AI Model Inference API Access AML.T0049 Exploit Public-Facing Application AML.T0107 Exploitation for Defense Evasion Compliance Controls Affected
What are the technical details?
Original Advisory
### Summary A vulnerability in ASGI web servers and starlette's trust on those web servers enables an authentication bypass of the OpenAI API `AuthenticationMiddleware`, which was discovered during @x41sec's source code audit. It allows to use the API without providing the configured `VLLM_API_KEY` or `--api-key`. ### Details In https://github.com/vllm-project/vllm/blob/v0.14.0/vllm/entrypoints/openai/api_server.py#L689-L692 the `url_path` is taken from the `URL`, which is reconstructed by _starlette_ based on the request `scope`. ```py from starlette.datastructures import URL, Headers, MutableHeaders, State # ... url_path = URL(scope=scope).path.removeprefix(root_path) headers = Headers(scope=scope) if url_path.startswith("/v1") and not self.verify_token(headers): response = JSONResponse(content={"error": "Unauthorized"}, status_code=401) return response(scope, receive, send) return self.app(scope, receive, send) ``` The request `scope` includes the request's `Host:` header and reconstructs the URL as shown below: ```py f"{scheme}://{host_header}{path}" ``` Neither starlette nor [any of the ASGI servers](https://asgi.readthedocs.io/en/latest/implementations.html#servers) (including uvicorn, which vllm uses) properly filter the `Host:` header for invalid characters. This allows an attacker to include special URL characters such as `/` or `?` in the `Host:` header and thereby control the reconstructed URL and it's `.path` attribute. FastAPI/starlette's routing uses the HTTP path and does not depend on the parsed url.path attribute, allowing attackers to reach an endpoint via a certain path while providing a different value in the `.path`. ### Impact - Instances of vllm that use an API Key for the OpenAI API and expose the API to attackers. - Instances behind an RFC-conforming web server (such as nginx) are **not** affected.
Exploitation Scenario
An attacker identifies a vllm instance (version < 0.22.0) serving an OpenAI-compatible API directly on port 8000, without nginx in the path — discoverable via Shodan, internal network scans, or reconnaissance of internal ML infrastructure. The attacker crafts an HTTP POST to /v1/chat/completions with the Host header set to 'attacker.com?x=' — starlette reconstructs the URL as 'http://attacker.com?x=/v1/chat/completions', and the parsed .path resolves to '/' since the query string absorbs the remainder. The AuthenticationMiddleware checks if this path starts with '/v1' — it does not, so the auth check is skipped and the request is forwarded to the FastAPI application. FastAPI routes based on the actual HTTP path '/v1/chat/completions' and serves the endpoint normally. The attacker now has unrestricted LLM access: they can query the hosted model without a valid key, extract system prompts, run resource-intensive generation requests to exhaust the victim's GPU budget, or repurpose the inference endpoint as free compute for their own tasks.
Weaknesses (CWE)
CWE-444 — Inconsistent Interpretation of HTTP Requests ('HTTP Request/Response Smuggling'): The product acts as an intermediary HTTP agent (such as a proxy or firewall) in the data flow between two entities such as a client and server, but it does not interpret malformed HTTP requests or responses in ways that are consistent with how the messages will be processed by those entities that are at the ultimate destination.
- [Implementation] Use a web server that employs a strict HTTP parsing procedure, such as Apache [REF-433].
- [Implementation] Use only SSL communication.
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:H References
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm