CVE-2026-9540: vLLM unauthenticated DoS

Q: Is CVE-2026-9540 actively exploited?

No confirmed active exploitation of CVE-2026-9540 has been reported, but organizations should still patch proactively.

Q: How to fix CVE-2026-9540?

1. Immediate — place vLLM behind an authenticated reverse proxy or API gateway (bearer token, mTLS, or IP allowlist) to prevent unauthenticated access to serving endpoints. 2. Apply rate limiting at the gateway to throttle request patterns that could trigger resource exhaustion. 3. Restrict the vLLM API port to internal network interfaces unless external access is strictly required. 4. Set --max-num-seqs and --max-num-batched-tokens launch flags to bound per-instance resource consumption. 5. Track https://github.com/vllm-project/vllm/pull/37594 and apply the patch as soon as a fixed release is published. 6. For detection: alert on sustained HTTP 5xx error rates, abnormal request latency spikes, or vLLM process restarts on inference nodes.

Q: What systems are affected by CVE-2026-9540?

This vulnerability affects the following AI/ML architecture patterns: LLM model serving, OpenAI-compatible API endpoints, RAG pipelines, AI inference pipelines, Agentic AI systems.

Q: What is the CVSS score for CVE-2026-9540?

CVE-2026-9540 has a CVSS v3.1 base score of 5.3 (MEDIUM). The EPSS exploitation probability is 0.43%.

CISO Take

CVE-2026-9540 is an unauthenticated denial-of-service vulnerability in vLLM 0.19.0's OpenAI-compatible HTTP serving path, caused by improper resource release (CWE-404) that any remote attacker can trigger with zero credentials and no user interaction. vLLM is among the most widely deployed open-source LLM inference engines, meaning production AI services — including RAG pipelines, copilot backends, and agentic platforms built on its /v1 endpoints — face full availability disruption. The exploit is publicly referenced in the advisory, and no official patch exists yet: the fix is an open pull request (#37594) still awaiting maintainer acceptance, leaving deployments exposed. Until a fixed release is cut, place vLLM behind an authenticated reverse proxy with rate limiting and monitor PR #37594 closely.

Sources: NVD GitHub Advisory ATLAS

What is the risk?

Medium CVSS (5.3) understates practical risk due to three compounding factors: zero authentication required on the default serving interface, a publicly available exploit referenced in the advisory, and no official patch — only an unmerged PR. Any internet-exposed or multi-tenant-accessible vLLM 0.19.0 instance is a viable target. Organizations running vLLM with direct network exposure or inside shared inference platforms carry the highest risk. Internal-only deployments protected by authenticated API gateways have substantially reduced — but not eliminated — exposure.

How does the attack unfold?

Target Discovery

Adversary scans for internet-exposed vLLM instances by probing default port 8000 or fingerprinting OpenAI-compatible /v1/models responses to confirm vLLM 0.19.0 is running.

AML.T0006

Exploit Delivery

Adversary sends crafted HTTP requests to the OpenAI-compatible serving path that trigger the CWE-404 improper resource release condition, requiring no credentials.

AML.T0049

Service Disruption

Resource exhaustion causes the vLLM inference server to become unresponsive or crash, denying AI inference to all downstream consumers until the process is restarted.

AML.T0029

Target Discovery

Adversary scans for internet-exposed vLLM instances by probing default port 8000 or fingerprinting OpenAI-compatible /v1/models responses to confirm vLLM 0.19.0 is running.

AML.T0006

Exploit Delivery

Adversary sends crafted HTTP requests to the OpenAI-compatible serving path that trigger the CWE-404 improper resource release condition, requiring no credentials.

AML.T0049

Service Disruption

Resource exhaustion causes the vLLM inference server to become unresponsive or crash, denying AI inference to all downstream consumers until the process is restarted.

AML.T0029

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
vLLM	pip	<= 0.19.0	No patch
85.4K 129 dependents Pushed 5d ago 22% patched ~52d to patch Full package profile →

Do you use vLLM? You're affected.

How severe is it?

CVSS 3.1

5.3 / 10

EPSS

0.4%

chance of exploitation in 30 days

Higher than 34% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Trivial

Exploitation Confidence

medium

○ CISA SSVC: Public PoC

Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV Network

AC Low

PR None

UI None

S Unchanged

C None

I None

A Low

What should I do?

6 steps

Immediate — place vLLM behind an authenticated reverse proxy or API gateway (bearer token, mTLS, or IP allowlist) to prevent unauthenticated access to serving endpoints.
Apply rate limiting at the gateway to throttle request patterns that could trigger resource exhaustion.
Restrict the vLLM API port to internal network interfaces unless external access is strictly required.
Set --max-num-seqs and --max-num-batched-tokens launch flags to bound per-instance resource consumption.
Track https://github.com/vllm-project/vllm/pull/37594 and apply the patch as soon as a fixed release is published.
For detection: alert on sustained HTTP 5xx error rates, abnormal request latency spikes, or vLLM process restarts on inference nodes.

What does CISA's SSVC say?

Decision Track*

Exploitation poc

Automatable Yes

Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

DoS Inference API AML.T0029 - Denial of AI Service AML.T0034.000 - Excessive Queries AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.9.3 - Availability of AI systems

NIST AI RMF

MANAGE 2.2 - Risk Response — Treat

OWASP LLM Top 10

LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2026-9540?

CVE-2026-9540 is an unauthenticated denial-of-service vulnerability in vLLM 0.19.0's OpenAI-compatible HTTP serving path, caused by improper resource release (CWE-404) that any remote attacker can trigger with zero credentials and no user interaction. vLLM is among the most widely deployed open-source LLM inference engines, meaning production AI services — including RAG pipelines, copilot backends, and agentic platforms built on its /v1 endpoints — face full availability disruption. The exploit is publicly referenced in the advisory, and no official patch exists yet: the fix is an open pull request (#37594) still awaiting maintainer acceptance, leaving deployments exposed. Until a fixed release is cut, place vLLM behind an authenticated reverse proxy with rate limiting and monitor PR #37594 closely.

Is CVE-2026-9540 actively exploited?

No confirmed active exploitation of CVE-2026-9540 has been reported, but organizations should still patch proactively.

How to fix CVE-2026-9540?

1. Immediate — place vLLM behind an authenticated reverse proxy or API gateway (bearer token, mTLS, or IP allowlist) to prevent unauthenticated access to serving endpoints. 2. Apply rate limiting at the gateway to throttle request patterns that could trigger resource exhaustion. 3. Restrict the vLLM API port to internal network interfaces unless external access is strictly required. 4. Set --max-num-seqs and --max-num-batched-tokens launch flags to bound per-instance resource consumption. 5. Track https://github.com/vllm-project/vllm/pull/37594 and apply the patch as soon as a fixed release is published. 6. For detection: alert on sustained HTTP 5xx error rates, abnormal request latency spikes, or vLLM process restarts on inference nodes.

What systems are affected by CVE-2026-9540?

This vulnerability affects the following AI/ML architecture patterns: LLM model serving, OpenAI-compatible API endpoints, RAG pipelines, AI inference pipelines, Agentic AI systems.

What is the CVSS score for CVE-2026-9540?

CVE-2026-9540 has a CVSS v3.1 base score of 5.3 (MEDIUM). The EPSS exploitation probability is 0.43%.

What is the AI security impact?

Affected AI Architectures

LLM model servingOpenAI-compatible API endpointsRAG pipelinesAI inference pipelinesAgentic AI systems

MITRE ATLAS Techniques

AML.T0029 Denial of AI Service

AML.T0034.000 Excessive Queries

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15

ISO 42001: A.9.3

NIST AI RMF: MANAGE 2.2

OWASP LLM Top 10: LLM04

What are the technical details?

Original Advisory

A vulnerability was identified in vllm-project vllm 0.19.0. This issue affects some unknown processing of the component OpenAI-compatible Serving Path. Such manipulation leads to denial of service. It is possible to launch the attack remotely. The exploit is publicly available and might be used. The pull request to fix this issue awaits acceptance.

Exploitation Scenario

An adversary identifies an internet-exposed vLLM 0.19.0 instance by scanning for the default API port (8000) or fingerprinting OpenAI-compatible /v1/models responses. Using the publicly referenced exploit, they send crafted HTTP requests to the OpenAI-compatible serving path that trigger the CWE-404 condition — likely exhausting connection handles, async worker slots, or memory allocations that are never properly released. The vLLM process becomes unresponsive or crashes, denying inference service to all legitimate consumers. Because no authentication is required, the attack demands only network reachability. The adversary can loop the attack to maintain persistent denial of service, effectively holding AI-dependent workloads hostage until the service is patched or traffic-filtered.

Weaknesses (CWE)

CWE-404 Improper Resource Shutdown or Release Primary CWE-404 Improper Resource Shutdown or Release Primary

CWE-404 — Improper Resource Shutdown or Release: The product does not release or incorrectly releases a resource before it is made available for re-use.

[Requirements] Use a language that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. For example, languages such as Java, Ruby, and Lisp perform automatic garbage collection that releases memory for objects that have been deallocated.
[Implementation] It is good practice to be responsible for freeing all resources you allocate and to be consistent with how and where you free memory in a function. If you allocate memory that you intend to free upon completion of the function, you must be sure to free the memory at all exit points for that function including error conditions.

Source: MITRE CWE corpus.