CVE-2026-9540: vllm: unauthenticated DoS in OpenAI-compatible serving path

MEDIUM
Published May 26, 2026
CISO Take

CVE-2026-9540 is an unauthenticated denial-of-service vulnerability in vLLM 0.19.0's OpenAI-compatible HTTP serving path, caused by improper resource release (CWE-404) that any remote attacker can trigger with zero credentials and no user interaction. vLLM is among the most widely deployed open-source LLM inference engines, meaning production AI services — including RAG pipelines, copilot backends, and agentic platforms built on its /v1 endpoints — face full availability disruption. The exploit is publicly referenced in the advisory, and no official patch exists yet: the fix is an open pull request (#37594) still awaiting maintainer acceptance, leaving deployments exposed. Until a fixed release is cut, place vLLM behind an authenticated reverse proxy with rate limiting and monitor PR #37594 closely.

Sources: NVD GitHub Advisory ATLAS

What is the risk?

Medium CVSS (5.3) understates practical risk due to three compounding factors: zero authentication required on the default serving interface, a publicly available exploit referenced in the advisory, and no official patch — only an unmerged PR. Any internet-exposed or multi-tenant-accessible vLLM 0.19.0 instance is a viable target. Organizations running vLLM with direct network exposure or inside shared inference platforms carry the highest risk. Internal-only deployments protected by authenticated API gateways have substantially reduced — but not eliminated — exposure.

Attack Kill Chain

Target Discovery
Adversary scans for internet-exposed vLLM instances by probing default port 8000 or fingerprinting OpenAI-compatible /v1/models responses to confirm vLLM 0.19.0 is running.
AML.T0006
Exploit Delivery
Adversary sends crafted HTTP requests to the OpenAI-compatible serving path that trigger the CWE-404 improper resource release condition, requiring no credentials.
AML.T0049
Service Disruption
Resource exhaustion causes the vLLM inference server to become unresponsive or crash, denying AI inference to all downstream consumers until the process is restarted.
AML.T0029

What systems are affected?

Package Ecosystem Vulnerable Range Patched
vllm pip No patch
80.8K 127 dependents Pushed 2d ago 55% patched ~33d to patch Full package profile →

Do you use vllm? You're affected.

Severity & Risk

CVSS 3.1
5.3 / 10
EPSS
N/A
Exploitation Status
No known exploitation
Sophistication
Trivial

Attack Surface

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C None
I None
A Low

What should I do?

6 steps
  1. Immediate — place vLLM behind an authenticated reverse proxy or API gateway (bearer token, mTLS, or IP allowlist) to prevent unauthenticated access to serving endpoints.

  2. Apply rate limiting at the gateway to throttle request patterns that could trigger resource exhaustion.

  3. Restrict the vLLM API port to internal network interfaces unless external access is strictly required.

  4. Set --max-num-seqs and --max-num-batched-tokens launch flags to bound per-instance resource consumption.

  5. Track https://github.com/vllm-project/vllm/pull/37594 and apply the patch as soon as a fixed release is published.

  6. For detection: alert on sustained HTTP 5xx error rates, abnormal request latency spikes, or vLLM process restarts on inference nodes.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.9.3 - Availability of AI systems
NIST AI RMF
MANAGE 2.2 - Risk Response — Treat
OWASP LLM Top 10
LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2026-9540?

CVE-2026-9540 is an unauthenticated denial-of-service vulnerability in vLLM 0.19.0's OpenAI-compatible HTTP serving path, caused by improper resource release (CWE-404) that any remote attacker can trigger with zero credentials and no user interaction. vLLM is among the most widely deployed open-source LLM inference engines, meaning production AI services — including RAG pipelines, copilot backends, and agentic platforms built on its /v1 endpoints — face full availability disruption. The exploit is publicly referenced in the advisory, and no official patch exists yet: the fix is an open pull request (#37594) still awaiting maintainer acceptance, leaving deployments exposed. Until a fixed release is cut, place vLLM behind an authenticated reverse proxy with rate limiting and monitor PR #37594 closely.

Is CVE-2026-9540 actively exploited?

No confirmed active exploitation of CVE-2026-9540 has been reported, but organizations should still patch proactively.

How to fix CVE-2026-9540?

1. Immediate — place vLLM behind an authenticated reverse proxy or API gateway (bearer token, mTLS, or IP allowlist) to prevent unauthenticated access to serving endpoints. 2. Apply rate limiting at the gateway to throttle request patterns that could trigger resource exhaustion. 3. Restrict the vLLM API port to internal network interfaces unless external access is strictly required. 4. Set --max-num-seqs and --max-num-batched-tokens launch flags to bound per-instance resource consumption. 5. Track https://github.com/vllm-project/vllm/pull/37594 and apply the patch as soon as a fixed release is published. 6. For detection: alert on sustained HTTP 5xx error rates, abnormal request latency spikes, or vLLM process restarts on inference nodes.

What systems are affected by CVE-2026-9540?

This vulnerability affects the following AI/ML architecture patterns: LLM model serving, OpenAI-compatible API endpoints, RAG pipelines, AI inference pipelines, Agentic AI systems.

What is the CVSS score for CVE-2026-9540?

CVE-2026-9540 has a CVSS v3.1 base score of 5.3 (MEDIUM).

AI Security Impact

Affected AI Architectures

LLM model servingOpenAI-compatible API endpointsRAG pipelinesAI inference pipelinesAgentic AI systems

MITRE ATLAS Techniques

AML.T0029 Denial of AI Service
AML.T0034.000 Excessive Queries
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: A.9.3
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM04

Technical Details

Original Advisory

A vulnerability was identified in vllm-project vllm 0.19.0. This issue affects some unknown processing of the component OpenAI-compatible Serving Path. Such manipulation leads to denial of service. It is possible to launch the attack remotely. The exploit is publicly available and might be used. The pull request to fix this issue awaits acceptance.

Exploitation Scenario

An adversary identifies an internet-exposed vLLM 0.19.0 instance by scanning for the default API port (8000) or fingerprinting OpenAI-compatible /v1/models responses. Using the publicly referenced exploit, they send crafted HTTP requests to the OpenAI-compatible serving path that trigger the CWE-404 condition — likely exhausting connection handles, async worker slots, or memory allocations that are never properly released. The vLLM process becomes unresponsive or crashes, denying inference service to all legitimate consumers. Because no authentication is required, the attack demands only network reachability. The adversary can loop the attack to maintain persistent denial of service, effectively holding AI-dependent workloads hostage until the service is patched or traffic-filtered.

Weaknesses (CWE)

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:L

Timeline

Published
May 26, 2026
Last Modified
May 26, 2026
First Seen
May 26, 2026

Related Vulnerabilities