Any public-facing vLLM inference server running < 0.19.0 can be taken down with a single unauthenticated HTTP request by setting n to an astronomically large integer — no credentials, no brute force, no rate limit required. Upgrade to vLLM 0.19.0 immediately; if patching is delayed, deploy an API gateway or WAF rule enforcing a hard ceiling on n (e.g., n ≤ 128) at the HTTP layer. Treat this as critical despite the CVSS 6.5 rating — trivial single-packet availability destruction of your LLM inference tier is a production outage, not a medium finding.
What is the risk?
CVSS 6.5 (Medium) materially understates operational risk. Exploitability is trivial — a single unauthenticated POST request is sufficient to crash the vLLM process via OOM kill. The attack targets the control plane (Python asyncio event loop + heap allocator), bypassing all hardware capacity planning and conventional bandwidth-based DoS defenses. For any organization exposing vLLM directly or proxying it without upstream input validation, this is effectively a critical availability risk. The dominant market position of vLLM as the standard open-source OpenAI-compatible inference server amplifies blast radius across the AI infrastructure ecosystem.
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| vllm | pip | >= 0.1.0, < 0.19.0 | 0.19.0 |
Do you use vllm? You're affected.
Severity & Risk
Attack Surface
What should I do?
5 steps-
PATCH
Upgrade to vLLM >= 0.19.0, which adds Pydantic Field upper-bound validation on n in ChatCompletionRequest and CompletionRequest.
-
WORKAROUND (if immediate patching is blocked): Deploy an API gateway (Kong, Nginx, AWS API GW) with JSON body inspection enforcing n <= 128 (or your operational maximum); reject with HTTP 400 on violation.
-
NETWORK HARDENING
Never expose vLLM port 8000 directly to the internet; require authenticated reverse proxy in front of all inference endpoints.
-
DETECTION
Enable HTTP request body logging and alert on n values > 1000 in /v1/chat/completions and /v1/completions payloads; monitor for OOM kills via dmesg/systemd journal and vLLM process restarts.
-
DEFENSE-IN-DEPTH: Implement per-IP and per-token rate limiting at the load balancer layer; configure container memory limits and auto-restart policies to minimize recovery time post-crash.
CISA SSVC Assessment
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2026-34756?
Any public-facing vLLM inference server running < 0.19.0 can be taken down with a single unauthenticated HTTP request by setting n to an astronomically large integer — no credentials, no brute force, no rate limit required. Upgrade to vLLM 0.19.0 immediately; if patching is delayed, deploy an API gateway or WAF rule enforcing a hard ceiling on n (e.g., n ≤ 128) at the HTTP layer. Treat this as critical despite the CVSS 6.5 rating — trivial single-packet availability destruction of your LLM inference tier is a production outage, not a medium finding.
Is CVE-2026-34756 actively exploited?
No confirmed active exploitation of CVE-2026-34756 has been reported, but organizations should still patch proactively.
How to fix CVE-2026-34756?
1. PATCH: Upgrade to vLLM >= 0.19.0, which adds Pydantic Field upper-bound validation on n in ChatCompletionRequest and CompletionRequest. 2. WORKAROUND (if immediate patching is blocked): Deploy an API gateway (Kong, Nginx, AWS API GW) with JSON body inspection enforcing n <= 128 (or your operational maximum); reject with HTTP 400 on violation. 3. NETWORK HARDENING: Never expose vLLM port 8000 directly to the internet; require authenticated reverse proxy in front of all inference endpoints. 4. DETECTION: Enable HTTP request body logging and alert on n values > 1000 in /v1/chat/completions and /v1/completions payloads; monitor for OOM kills via dmesg/systemd journal and vLLM process restarts. 5. DEFENSE-IN-DEPTH: Implement per-IP and per-token rate limiting at the load balancer layer; configure container memory limits and auto-restart policies to minimize recovery time post-crash.
What systems are affected by CVE-2026-34756?
This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, OpenAI-compatible API endpoints, Model serving, AI API gateways, Agentic frameworks using vLLM backend.
What is the CVSS score for CVE-2026-34756?
CVE-2026-34756 has a CVSS v3.1 base score of 6.5 (MEDIUM). The EPSS exploitation probability is 0.05%.
Technical Details
NVD Description
vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.19.0, a Denial of Service vulnerability exists in the vLLM OpenAI-compatible API server. Due to the lack of an upper bound validation on the n parameter in the ChatCompletionRequest and CompletionRequest Pydantic models, an unauthenticated attacker can send a single HTTP request with an astronomically large n value. This completely blocks the Python asyncio event loop and causes immediate Out-Of-Memory crashes by allocating millions of request object copies in the heap before the request even reaches the scheduling queue. This vulnerability is fixed in 0.19.0.
Exploitation Scenario
A threat actor scans for publicly accessible vLLM instances (port 8000/8080, path /v1/models or /health). Upon confirming a pre-0.19.0 version, they send a single HTTP POST to /v1/chat/completions with {"model": "any", "messages": [{"role": "user", "content": "hi"}], "n": 2147483647}. The async engine immediately enters a synchronous for-loop generating ~2 billion child request copies via copy(), monopolizing the asyncio event loop and driving RSS up by gigabytes per second. The Linux OOM-killer terminates the vLLM process within seconds. All downstream AI features — customer-facing chatbots, internal copilots, agentic pipelines — go dark with zero authentication bypass required, zero payload size threshold triggered, and a single packet. A ransomware operator or competitor can automate this to continuously restart-kill the process faster than ops teams can respond.
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H References
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm