CVE-2026-9540: vllm: unauthenticated DoS in OpenAI-compatible serving path
MEDIUMCVE-2026-9540 is an unauthenticated denial-of-service vulnerability in vLLM 0.19.0's OpenAI-compatible HTTP serving path, caused by improper resource release (CWE-404) that any remote attacker can trigger with zero credentials and no user interaction. vLLM is among the most widely deployed open-source LLM inference engines, meaning production AI services — including RAG pipelines, copilot backends, and agentic platforms built on its /v1 endpoints — face full availability disruption. The exploit is publicly referenced in the advisory, and no official patch exists yet: the fix is an open pull request (#37594) still awaiting maintainer acceptance, leaving deployments exposed. Until a fixed release is cut, place vLLM behind an authenticated reverse proxy with rate limiting and monitor PR #37594 closely.
What is the risk?
Medium CVSS (5.3) understates practical risk due to three compounding factors: zero authentication required on the default serving interface, a publicly available exploit referenced in the advisory, and no official patch — only an unmerged PR. Any internet-exposed or multi-tenant-accessible vLLM 0.19.0 instance is a viable target. Organizations running vLLM with direct network exposure or inside shared inference platforms carry the highest risk. Internal-only deployments protected by authenticated API gateways have substantially reduced — but not eliminated — exposure.
Attack Kill Chain
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| vllm | pip | — | No patch |
Do you use vllm? You're affected.
Severity & Risk
Attack Surface
What should I do?
6 steps-
Immediate — place vLLM behind an authenticated reverse proxy or API gateway (bearer token, mTLS, or IP allowlist) to prevent unauthenticated access to serving endpoints.
-
Apply rate limiting at the gateway to throttle request patterns that could trigger resource exhaustion.
-
Restrict the vLLM API port to internal network interfaces unless external access is strictly required.
-
Set --max-num-seqs and --max-num-batched-tokens launch flags to bound per-instance resource consumption.
-
Track https://github.com/vllm-project/vllm/pull/37594 and apply the patch as soon as a fixed release is published.
-
For detection: alert on sustained HTTP 5xx error rates, abnormal request latency spikes, or vLLM process restarts on inference nodes.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2026-9540?
CVE-2026-9540 is an unauthenticated denial-of-service vulnerability in vLLM 0.19.0's OpenAI-compatible HTTP serving path, caused by improper resource release (CWE-404) that any remote attacker can trigger with zero credentials and no user interaction. vLLM is among the most widely deployed open-source LLM inference engines, meaning production AI services — including RAG pipelines, copilot backends, and agentic platforms built on its /v1 endpoints — face full availability disruption. The exploit is publicly referenced in the advisory, and no official patch exists yet: the fix is an open pull request (#37594) still awaiting maintainer acceptance, leaving deployments exposed. Until a fixed release is cut, place vLLM behind an authenticated reverse proxy with rate limiting and monitor PR #37594 closely.
Is CVE-2026-9540 actively exploited?
No confirmed active exploitation of CVE-2026-9540 has been reported, but organizations should still patch proactively.
How to fix CVE-2026-9540?
1. Immediate — place vLLM behind an authenticated reverse proxy or API gateway (bearer token, mTLS, or IP allowlist) to prevent unauthenticated access to serving endpoints. 2. Apply rate limiting at the gateway to throttle request patterns that could trigger resource exhaustion. 3. Restrict the vLLM API port to internal network interfaces unless external access is strictly required. 4. Set --max-num-seqs and --max-num-batched-tokens launch flags to bound per-instance resource consumption. 5. Track https://github.com/vllm-project/vllm/pull/37594 and apply the patch as soon as a fixed release is published. 6. For detection: alert on sustained HTTP 5xx error rates, abnormal request latency spikes, or vLLM process restarts on inference nodes.
What systems are affected by CVE-2026-9540?
This vulnerability affects the following AI/ML architecture patterns: LLM model serving, OpenAI-compatible API endpoints, RAG pipelines, AI inference pipelines, Agentic AI systems.
What is the CVSS score for CVE-2026-9540?
CVE-2026-9540 has a CVSS v3.1 base score of 5.3 (MEDIUM).
AI Security Impact
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0029 Denial of AI Service AML.T0034.000 Excessive Queries AML.T0049 Exploit Public-Facing Application Compliance Controls Affected
Technical Details
Original Advisory
A vulnerability was identified in vllm-project vllm 0.19.0. This issue affects some unknown processing of the component OpenAI-compatible Serving Path. Such manipulation leads to denial of service. It is possible to launch the attack remotely. The exploit is publicly available and might be used. The pull request to fix this issue awaits acceptance.
Exploitation Scenario
An adversary identifies an internet-exposed vLLM 0.19.0 instance by scanning for the default API port (8000) or fingerprinting OpenAI-compatible /v1/models responses. Using the publicly referenced exploit, they send crafted HTTP requests to the OpenAI-compatible serving path that trigger the CWE-404 condition — likely exhausting connection handles, async worker slots, or memory allocations that are never properly released. The vLLM process becomes unresponsive or crashes, denying inference service to all legitimate consumers. Because no authentication is required, the attack demands only network reachability. The adversary can loop the attack to maintain persistent denial of service, effectively holding AI-dependent workloads hostage until the service is patched or traffic-filtered.
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:L References
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm