CVE-2026-34756: vLLM DoS — MEDIUM

Q: Is CVE-2026-34756 actively exploited?

No confirmed active exploitation of CVE-2026-34756 has been reported, but organizations should still patch proactively.

Q: How to fix CVE-2026-34756?

1. PATCH: Upgrade to vLLM >= 0.19.0, which adds Pydantic Field upper-bound validation on n in ChatCompletionRequest and CompletionRequest. 2. WORKAROUND (if immediate patching is blocked): Deploy an API gateway (Kong, Nginx, AWS API GW) with JSON body inspection enforcing n 1000 in /v1/chat/completions and /v1/completions payloads; monitor for OOM kills via dmesg/systemd journal and vLLM process restarts. 5. DEFENSE-IN-DEPTH: Implement per-IP and per-token rate limiting at the load balancer layer; configure container memory limits and auto-restart policies to minimize recovery time post-crash.

CISO Take

Any public-facing vLLM inference server running < 0.19.0 can be taken down with a single unauthenticated HTTP request by setting n to an astronomically large integer — no credentials, no brute force, no rate limit required. Upgrade to vLLM 0.19.0 immediately; if patching is delayed, deploy an API gateway or WAF rule enforcing a hard ceiling on n (e.g., n ≤ 128) at the HTTP layer. Treat this as critical despite the CVSS 6.5 rating — trivial single-packet availability destruction of your LLM inference tier is a production outage, not a medium finding.

What is the risk?

CVSS 6.5 (Medium) materially understates operational risk. Exploitability is trivial — a single unauthenticated POST request is sufficient to crash the vLLM process via OOM kill. The attack targets the control plane (Python asyncio event loop + heap allocator), bypassing all hardware capacity planning and conventional bandwidth-based DoS defenses. For any organization exposing vLLM directly or proxying it without upstream input validation, this is effectively a critical availability risk. The dominant market position of vLLM as the standard open-source OpenAI-compatible inference server amplifies blast radius across the AI infrastructure ecosystem.

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
vLLM	pip	>= 0.1.0, < 0.19.0	`0.19.0`
84.6K 130 dependents Pushed 6d ago 23% patched ~51d to patch Full package profile →

Do you use vLLM? You're affected.

How severe is it?

CVSS 3.1

6.5 / 10

EPSS

0.3%

chance of exploitation in 30 days

Higher than 25% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

No known exploitation

Sophistication

Trivial

What is the attack surface?

AV Network

AC Low

PR Low

UI None

S Unchanged

C None

I None

A High

What should I do?

5 steps

PATCH

Upgrade to vLLM >= 0.19.0, which adds Pydantic Field upper-bound validation on n in ChatCompletionRequest and CompletionRequest.
WORKAROUND (if immediate patching is blocked): Deploy an API gateway (Kong, Nginx, AWS API GW) with JSON body inspection enforcing n <= 128 (or your operational maximum); reject with HTTP 400 on violation.
NETWORK HARDENING

Never expose vLLM port 8000 directly to the internet; require authenticated reverse proxy in front of all inference endpoints.
DETECTION

Enable HTTP request body logging and alert on n values > 1000 in /v1/chat/completions and /v1/completions payloads; monitor for OOM kills via dmesg/systemd journal and vLLM process restarts.
DEFENSE-IN-DEPTH: Implement per-IP and per-token rate limiting at the load balancer layer; configure container memory limits and auto-restart policies to minimize recovery time post-crash.

What does CISA's SSVC say?

Decision Track

Exploitation none

Automatable No

Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

DoS Inference API AML.T0029 - Denial of AI Service AML.T0034 - Cost Harvesting AML.T0040 - AI Model Inference API Access AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

6.1 - Actions to address risks and opportunities 8.4 - AI system operation

NIST AI RMF

MANAGE-2.2 - Risk response and recovery mechanisms

OWASP LLM Top 10

LLM10:2025 - Unbounded Consumption

Frequently Asked Questions

What is CVE-2026-34756?

Any public-facing vLLM inference server running < 0.19.0 can be taken down with a single unauthenticated HTTP request by setting n to an astronomically large integer — no credentials, no brute force, no rate limit required. Upgrade to vLLM 0.19.0 immediately; if patching is delayed, deploy an API gateway or WAF rule enforcing a hard ceiling on n (e.g., n ≤ 128) at the HTTP layer. Treat this as critical despite the CVSS 6.5 rating — trivial single-packet availability destruction of your LLM inference tier is a production outage, not a medium finding.

Is CVE-2026-34756 actively exploited?

No confirmed active exploitation of CVE-2026-34756 has been reported, but organizations should still patch proactively.

How to fix CVE-2026-34756?

1. PATCH: Upgrade to vLLM >= 0.19.0, which adds Pydantic Field upper-bound validation on n in ChatCompletionRequest and CompletionRequest. 2. WORKAROUND (if immediate patching is blocked): Deploy an API gateway (Kong, Nginx, AWS API GW) with JSON body inspection enforcing n <= 128 (or your operational maximum); reject with HTTP 400 on violation. 3. NETWORK HARDENING: Never expose vLLM port 8000 directly to the internet; require authenticated reverse proxy in front of all inference endpoints. 4. DETECTION: Enable HTTP request body logging and alert on n values > 1000 in /v1/chat/completions and /v1/completions payloads; monitor for OOM kills via dmesg/systemd journal and vLLM process restarts. 5. DEFENSE-IN-DEPTH: Implement per-IP and per-token rate limiting at the load balancer layer; configure container memory limits and auto-restart policies to minimize recovery time post-crash.

What systems are affected by CVE-2026-34756?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, OpenAI-compatible API endpoints, Model serving, AI API gateways, Agentic frameworks using vLLM backend.

What is the CVSS score for CVE-2026-34756?

CVE-2026-34756 has a CVSS v3.1 base score of 6.5 (MEDIUM). The EPSS exploitation probability is 0.33%.

What is the AI security impact?

Affected AI Architectures

LLM inference servingOpenAI-compatible API endpointsModel servingAI API gatewaysAgentic frameworks using vLLM backend

MITRE ATLAS Techniques

AML.T0029 Denial of AI Service

AML.T0034 Cost Harvesting

AML.T0040 AI Model Inference API Access

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15

ISO 42001: 6.1, 8.4

NIST AI RMF: MANAGE-2.2

OWASP LLM Top 10: LLM10:2025

What are the technical details?

Original Advisory

vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.19.0, a Denial of Service vulnerability exists in the vLLM OpenAI-compatible API server. Due to the lack of an upper bound validation on the n parameter in the ChatCompletionRequest and CompletionRequest Pydantic models, an unauthenticated attacker can send a single HTTP request with an astronomically large n value. This completely blocks the Python asyncio event loop and causes immediate Out-Of-Memory crashes by allocating millions of request object copies in the heap before the request even reaches the scheduling queue. This vulnerability is fixed in 0.19.0.

Exploitation Scenario

A threat actor scans for publicly accessible vLLM instances (port 8000/8080, path /v1/models or /health). Upon confirming a pre-0.19.0 version, they send a single HTTP POST to /v1/chat/completions with {"model": "any", "messages": [{"role": "user", "content": "hi"}], "n": 2147483647}. The async engine immediately enters a synchronous for-loop generating ~2 billion child request copies via copy(), monopolizing the asyncio event loop and driving RSS up by gigabytes per second. The Linux OOM-killer terminates the vLLM process within seconds. All downstream AI features — customer-facing chatbots, internal copilots, agentic pipelines — go dark with zero authentication bypass required, zero payload size threshold triggered, and a single packet. A ransomware operator or competitor can automate this to continuously restart-kill the process faster than ops teams can respond.

Weaknesses (CWE)

CWE-770 Allocation of Resources Without Limits or Throttling Primary CWE-770 Allocation of Resources Without Limits or Throttling Primary CWE-770 Allocation of Resources Without Limits or Throttling

CWE-770 — Allocation of Resources Without Limits or Throttling: The product allocates a reusable resource or group of resources on behalf of an actor without imposing any intended restrictions on the size or number of resources that can be allocated.

[Requirements] Clearly specify the minimum and maximum expectations for capabilities, and dictate which behaviors are acceptable when resource allocation reaches limits.
[Architecture and Design] Limit the amount of resources that are accessible to unprivileged users. Set per-user limits for resources. Allow the system administrator to define these limits. Be careful to avoid CWE-410.

Source: MITRE CWE corpus.