CVE-2026-48746: vLLM auth bypass exposes OpenAI

CISO Take

vllm's OpenAI-compatible API middleware contains a critical authentication bypass (CVSS 9.1) that allows any unauthenticated attacker to reach the LLM inference API by embedding path-separator characters in the HTTP Host header — starlette reconstructs a URL whose parsed .path attribute skips the API key check, while FastAPI still routes the actual request normally. With 130 downstream dependents and an EPSS placing this in the top 77th percentile for exploitation probability, any organization directly exposing a vllm endpoint without an RFC-conforming reverse proxy is fully compromised with a single crafted HTTP request: no credentials, no privileges, no user interaction required. The fix is immediate: upgrade to vllm 0.22.0; organizations that cannot patch right now can place nginx or an equivalent standards-compliant proxy upstream, as Host header normalization neutralizes the attack entirely.

Sources: NVD EPSS GitHub Advisory ATLAS

What is the risk?

Critical. CVSS 9.1 with a fully network-accessible, zero-privilege, zero-interaction attack vector makes this a top-priority patch for any team running vllm. The vulnerability is deterministic and trivially exploitable — a single malformed HTTP request is sufficient, and no AI or ML knowledge is required. The only natural mitigating factor is that deployments behind RFC-conforming reverse proxies (nginx, Caddy, HAProxy) are not affected, as those normalize the Host header before it reaches vllm. Direct-to-vllm exposure — common in internal ML serving clusters, Kubernetes pod services, and developer sandboxes — eliminates this protection. The 77th EPSS percentile and public advisory from x41-dsec.de signal that exploit development and active scanning are likely to follow quickly.

How does the attack unfold?

Discovery

Attacker scans for vllm instances exposed directly on port 8000 without a normalizing reverse proxy, using banner grabbing, Shodan queries for vllm version strings, or internal network enumeration.

AML.T0006

Authentication Bypass

Attacker crafts an HTTP request with a Host header containing special URL characters (e.g., 'attacker.com?x='), causing starlette's URL reconstruction to produce a .path value that does not start with '/v1', causing AuthenticationMiddleware to skip the API key check.

AML.T0049

Inference API Access

FastAPI routes the request based on the actual HTTP path and serves the /v1/chat/completions or any other API endpoint normally, granting the attacker full unauthenticated model access.

AML.T0040

Impact

Attacker exploits unrestricted model access for cost harvesting via resource-intensive queries, extraction of system prompts or session data, or GPU compute exhaustion causing denial of service for legitimate users.

AML.T0034

Discovery

Attacker scans for vllm instances exposed directly on port 8000 without a normalizing reverse proxy, using banner grabbing, Shodan queries for vllm version strings, or internal network enumeration.

AML.T0006

Authentication Bypass

Attacker crafts an HTTP request with a Host header containing special URL characters (e.g., 'attacker.com?x='), causing starlette's URL reconstruction to produce a .path value that does not start with '/v1', causing AuthenticationMiddleware to skip the API key check.

AML.T0049

Inference API Access

FastAPI routes the request based on the actual HTTP path and serves the /v1/chat/completions or any other API endpoint normally, granting the attacker full unauthenticated model access.

AML.T0040

Impact

Attacker exploits unrestricted model access for cost harvesting via resource-intensive queries, extraction of system prompts or session data, or GPU compute exhaustion causing denial of service for legitimate users.

AML.T0034

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
vLLM	pip	>= 0.3.0, < 0.22.0	`0.22.0`
87.2K 129 dependents Pushed 5d ago 26% patched ~51d to patch Full package profile →

Do you use vLLM? You're affected.

How severe is it?

CVSS 3.1

9.1 / 10

EPSS

1.2%

chance of exploitation in 30 days

Higher than 64% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Trivial

Exploitation Confidence

medium

○ CISA SSVC: Public PoC

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV Network

AC Low

PR None

UI None

S Unchanged

C High

I None

A High

What should I do?

5 steps

PATCH

Upgrade vllm to >= 0.22.0 immediately — this is the only complete fix.
WORKAROUND (if patching is not immediate): Place an RFC-conforming reverse proxy (nginx, Caddy, HAProxy) in front of vllm; Host header normalization at the proxy layer blocks the exploit.
NETWORK CONTROLS

Firewall vllm API ports to known client IPs; do not expose port 8000 directly to the internet or untrusted network segments.
DETECTION

Audit access logs for Host headers containing '/', '?', '@', or other special URL characters — these are reliable indicators of exploit attempts. Alert on inference requests where the Host header does not match the configured domain.
VERIFY EXPOSURE

Run 'curl -H "Host: attacker.com?x=" http://<vllm-host>/v1/models' without an API key — if model data is returned, the instance is vulnerable.

What does CISA's SSVC say?

Decision Track*

Exploitation poc

Automatable Yes

Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Auth Bypass DoS Data Extraction Inference API AML.T0006 - Active Scanning AML.T0034 - Cost Harvesting AML.T0034.000 - Excessive Queries AML.T0040 - AI Model Inference API Access AML.T0049 - Exploit Public-Facing Application AML.T0107 - Exploitation for Defense Evasion

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.6.2.6 - AI system security

NIST AI RMF

MANAGE-2.2 - Mechanisms exist for tracking identified AI risks

OWASP LLM Top 10

LLM10:2023 - Model Theft

Frequently Asked Questions

What is CVE-2026-48746?

vllm's OpenAI-compatible API middleware contains a critical authentication bypass (CVSS 9.1) that allows any unauthenticated attacker to reach the LLM inference API by embedding path-separator characters in the HTTP Host header — starlette reconstructs a URL whose parsed .path attribute skips the API key check, while FastAPI still routes the actual request normally. With 130 downstream dependents and an EPSS placing this in the top 77th percentile for exploitation probability, any organization directly exposing a vllm endpoint without an RFC-conforming reverse proxy is fully compromised with a single crafted HTTP request: no credentials, no privileges, no user interaction required. The fix is immediate: upgrade to vllm 0.22.0; organizations that cannot patch right now can place nginx or an equivalent standards-compliant proxy upstream, as Host header normalization neutralizes the attack entirely.

Is CVE-2026-48746 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2026-48746, increasing the risk of exploitation.

How to fix CVE-2026-48746?

1. PATCH: Upgrade vllm to >= 0.22.0 immediately — this is the only complete fix. 2. WORKAROUND (if patching is not immediate): Place an RFC-conforming reverse proxy (nginx, Caddy, HAProxy) in front of vllm; Host header normalization at the proxy layer blocks the exploit. 3. NETWORK CONTROLS: Firewall vllm API ports to known client IPs; do not expose port 8000 directly to the internet or untrusted network segments. 4. DETECTION: Audit access logs for Host headers containing '/', '?', '@', or other special URL characters — these are reliable indicators of exploit attempts. Alert on inference requests where the Host header does not match the configured domain. 5. VERIFY EXPOSURE: Run 'curl -H "Host: attacker.com?x=" http://<vllm-host>/v1/models' without an API key — if model data is returned, the instance is vulnerable.

What systems are affected by CVE-2026-48746?

This vulnerability affects the following AI/ML architecture patterns: model serving, LLM inference API, AI agent backends, MLOps inference pipelines, RAG generation layer.

What is the CVSS score for CVE-2026-48746?

CVE-2026-48746 has a CVSS v3.1 base score of 9.1 (CRITICAL). The EPSS exploitation probability is 1.15%.

What is the AI security impact?

Affected AI Architectures

model servingLLM inference APIAI agent backendsMLOps inference pipelinesRAG generation layer

MITRE ATLAS Techniques

AML.T0006 Active Scanning

AML.T0034 Cost Harvesting

AML.T0034.000 Excessive Queries

AML.T0040 AI Model Inference API Access

AML.T0049 Exploit Public-Facing Application

AML.T0107 Exploitation for Defense Evasion

Compliance Controls Affected

EU AI Act: Article 15

ISO 42001: A.6.2.6

NIST AI RMF: MANAGE-2.2

OWASP LLM Top 10: LLM10:2023

What are the technical details?

Original Advisory

vLLM is an inference and serving engine for large language models (LLMs). From 0.3.0 until 0.22.0, a vulnerability in ASGI web servers and starlette's trust on those web servers enables an authentication bypass of the OpenAI API AuthenticationMiddleware. It allows to use the API without providing the configured VLLM_API_KEY or --api-key. This vulnerability is fixed in 0.22.0.

Exploitation Scenario

An attacker identifies a vllm instance (version < 0.22.0) serving an OpenAI-compatible API directly on port 8000, without nginx in the path — discoverable via Shodan, internal network scans, or reconnaissance of internal ML infrastructure. The attacker crafts an HTTP POST to /v1/chat/completions with the Host header set to 'attacker.com?x=' — starlette reconstructs the URL as 'http://attacker.com?x=/v1/chat/completions', and the parsed .path resolves to '/' since the query string absorbs the remainder. The AuthenticationMiddleware checks if this path starts with '/v1' — it does not, so the auth check is skipped and the request is forwarded to the FastAPI application. FastAPI routes based on the actual HTTP path '/v1/chat/completions' and serves the endpoint normally. The attacker now has unrestricted LLM access: they can query the hosted model without a valid key, extract system prompts, run resource-intensive generation requests to exhaust the victim's GPU budget, or repurpose the inference endpoint as free compute for their own tasks.

Weaknesses (CWE)

CWE-444 Inconsistent Interpretation of HTTP Requests ('HTTP Request/Response Smuggling') Primary CWE-444 Inconsistent Interpretation of HTTP Requests ('HTTP Request/Response Smuggling') Primary CWE-444 Inconsistent Interpretation of HTTP Requests ('HTTP Request/Response Smuggling')

CWE-444 — Inconsistent Interpretation of HTTP Requests ('HTTP Request/Response Smuggling'): The product acts as an intermediary HTTP agent (such as a proxy or firewall) in the data flow between two entities such as a client and server, but it does not interpret malformed HTTP requests or responses in ways that are consistent with how the messages will be processed by those entities that are at the ultimate destination.