CVE-2026-34756: vLLM: DoS via unbounded n parameter causes OOM crash

GHSA-3mwp-wvh9-7528 MEDIUM
Published April 3, 2026
CISO Take

Any public-facing vLLM inference server running < 0.19.0 can be taken down with a single unauthenticated HTTP request by setting n to an astronomically large integer — no credentials, no brute force, no rate limit required. Upgrade to vLLM 0.19.0 immediately; if patching is delayed, deploy an API gateway or WAF rule enforcing a hard ceiling on n (e.g., n ≤ 128) at the HTTP layer. Treat this as critical despite the CVSS 6.5 rating — trivial single-packet availability destruction of your LLM inference tier is a production outage, not a medium finding.

What is the risk?

CVSS 6.5 (Medium) materially understates operational risk. Exploitability is trivial — a single unauthenticated POST request is sufficient to crash the vLLM process via OOM kill. The attack targets the control plane (Python asyncio event loop + heap allocator), bypassing all hardware capacity planning and conventional bandwidth-based DoS defenses. For any organization exposing vLLM directly or proxying it without upstream input validation, this is effectively a critical availability risk. The dominant market position of vLLM as the standard open-source OpenAI-compatible inference server amplifies blast radius across the AI infrastructure ecosystem.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
vllm pip >= 0.1.0, < 0.19.0 0.19.0
80.2K 127 dependents Pushed 3d ago 56% patched ~33d to patch Full package profile →

Do you use vllm? You're affected.

Severity & Risk

CVSS 3.1
6.5 / 10
EPSS
0.0%
chance of exploitation in 30 days
Higher than 15% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Trivial

Attack Surface

AV AC PR UI S C I A
AV Network
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

What should I do?

5 steps
  1. PATCH

    Upgrade to vLLM >= 0.19.0, which adds Pydantic Field upper-bound validation on n in ChatCompletionRequest and CompletionRequest.

  2. WORKAROUND (if immediate patching is blocked): Deploy an API gateway (Kong, Nginx, AWS API GW) with JSON body inspection enforcing n <= 128 (or your operational maximum); reject with HTTP 400 on violation.

  3. NETWORK HARDENING

    Never expose vLLM port 8000 directly to the internet; require authenticated reverse proxy in front of all inference endpoints.

  4. DETECTION

    Enable HTTP request body logging and alert on n values > 1000 in /v1/chat/completions and /v1/completions payloads; monitor for OOM kills via dmesg/systemd journal and vLLM process restarts.

  5. DEFENSE-IN-DEPTH: Implement per-IP and per-token rate limiting at the load balancer layer; configure container memory limits and auto-restart policies to minimize recovery time post-crash.

CISA SSVC Assessment

Decision Track
Exploitation none
Automatable No
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
6.1 - Actions to address risks and opportunities 8.4 - AI system operation
NIST AI RMF
MANAGE-2.2 - Risk response and recovery mechanisms
OWASP LLM Top 10
LLM10:2025 - Unbounded Consumption

Frequently Asked Questions

What is CVE-2026-34756?

Any public-facing vLLM inference server running < 0.19.0 can be taken down with a single unauthenticated HTTP request by setting n to an astronomically large integer — no credentials, no brute force, no rate limit required. Upgrade to vLLM 0.19.0 immediately; if patching is delayed, deploy an API gateway or WAF rule enforcing a hard ceiling on n (e.g., n ≤ 128) at the HTTP layer. Treat this as critical despite the CVSS 6.5 rating — trivial single-packet availability destruction of your LLM inference tier is a production outage, not a medium finding.

Is CVE-2026-34756 actively exploited?

No confirmed active exploitation of CVE-2026-34756 has been reported, but organizations should still patch proactively.

How to fix CVE-2026-34756?

1. PATCH: Upgrade to vLLM >= 0.19.0, which adds Pydantic Field upper-bound validation on n in ChatCompletionRequest and CompletionRequest. 2. WORKAROUND (if immediate patching is blocked): Deploy an API gateway (Kong, Nginx, AWS API GW) with JSON body inspection enforcing n <= 128 (or your operational maximum); reject with HTTP 400 on violation. 3. NETWORK HARDENING: Never expose vLLM port 8000 directly to the internet; require authenticated reverse proxy in front of all inference endpoints. 4. DETECTION: Enable HTTP request body logging and alert on n values > 1000 in /v1/chat/completions and /v1/completions payloads; monitor for OOM kills via dmesg/systemd journal and vLLM process restarts. 5. DEFENSE-IN-DEPTH: Implement per-IP and per-token rate limiting at the load balancer layer; configure container memory limits and auto-restart policies to minimize recovery time post-crash.

What systems are affected by CVE-2026-34756?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, OpenAI-compatible API endpoints, Model serving, AI API gateways, Agentic frameworks using vLLM backend.

What is the CVSS score for CVE-2026-34756?

CVE-2026-34756 has a CVSS v3.1 base score of 6.5 (MEDIUM). The EPSS exploitation probability is 0.05%.

Technical Details

NVD Description

vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.19.0, a Denial of Service vulnerability exists in the vLLM OpenAI-compatible API server. Due to the lack of an upper bound validation on the n parameter in the ChatCompletionRequest and CompletionRequest Pydantic models, an unauthenticated attacker can send a single HTTP request with an astronomically large n value. This completely blocks the Python asyncio event loop and causes immediate Out-Of-Memory crashes by allocating millions of request object copies in the heap before the request even reaches the scheduling queue. This vulnerability is fixed in 0.19.0.

Exploitation Scenario

A threat actor scans for publicly accessible vLLM instances (port 8000/8080, path /v1/models or /health). Upon confirming a pre-0.19.0 version, they send a single HTTP POST to /v1/chat/completions with {"model": "any", "messages": [{"role": "user", "content": "hi"}], "n": 2147483647}. The async engine immediately enters a synchronous for-loop generating ~2 billion child request copies via copy(), monopolizing the asyncio event loop and driving RSS up by gigabytes per second. The Linux OOM-killer terminates the vLLM process within seconds. All downstream AI features — customer-facing chatbots, internal copilots, agentic pipelines — go dark with zero authentication bypass required, zero payload size threshold triggered, and a single packet. A ransomware operator or competitor can automate this to continuously restart-kill the process faster than ops teams can respond.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
April 3, 2026
Last Modified
April 7, 2026
First Seen
April 3, 2026

Related Vulnerabilities