CVE-2026-5497: vLLM: unauthenticated OOM DoS via video frame parsing
AWAITING NVDvLLM 0.8.0+ contains an unauthenticated denial-of-service vulnerability reachable through its OpenAI-compatible chat completions API: a single crafted request embedding thousands of JPEG frames in a video data URL causes the inference server to exhaust memory and crash. This requires no credentials, no AI/ML expertise, and only one HTTP request, making it trivially weaponizable by any attacker with network access to your vLLM endpoint. vLLM is a de facto standard for production LLM serving—a successful attack takes down your entire inference layer, halting all AI-dependent workloads until the service is manually restarted. Apply the patch from commit 58ee614 immediately, enforce maximum payload size limits at the reverse proxy layer, and restrict multimodal API access to authenticated, trusted clients.
What is the risk?
HIGH operational risk for any organization running vLLM 0.8.0+ in production. The attack surface is the unauthenticated OpenAI-compatible endpoint—standard in most vLLM deployments. Exploitation requires zero credentials, no specialized tooling, and produces deterministic results: a service crash that must be manually remediated. While CVSS is not yet scored, the combination of no-authentication requirement, single-request exploit, and full availability impact on a critical AI infrastructure component justifies treating this as high severity. Default vLLM deployments with multimodal support enabled are fully exposed without any additional attacker effort.
How does the attack unfold?
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| vLLM | pip | — | No patch |
Do you use vLLM? You're affected.
How severe is it?
What should I do?
6 steps-
Patch: upgrade to a vLLM release containing commit 58ee61422169ce17e08248f8efa1e9df434fe395 or later.
-
Immediate workaround: block requests containing 'video/jpeg' data URLs at the reverse proxy or API gateway layer if multimodal video input is not required.
-
Payload limits: configure a strict maximum request body size (e.g., nginx client_max_body_size) to prevent oversized multimodal payloads from reaching the vLLM process.
-
Rate limiting: enforce per-IP and per-API-key request rate limits on the chat completions endpoint.
-
Network controls: ensure the vLLM API is never exposed to untrusted networks without an authenticated gateway in front; this vulnerability is zero-auth by default.
-
Detection: alert on OOM kills in vLLM service logs and unusually large incoming request payloads on multimodal endpoints.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2026-5497?
vLLM 0.8.0+ contains an unauthenticated denial-of-service vulnerability reachable through its OpenAI-compatible chat completions API: a single crafted request embedding thousands of JPEG frames in a video data URL causes the inference server to exhaust memory and crash. This requires no credentials, no AI/ML expertise, and only one HTTP request, making it trivially weaponizable by any attacker with network access to your vLLM endpoint. vLLM is a de facto standard for production LLM serving—a successful attack takes down your entire inference layer, halting all AI-dependent workloads until the service is manually restarted. Apply the patch from commit 58ee614 immediately, enforce maximum payload size limits at the reverse proxy layer, and restrict multimodal API access to authenticated, trusted clients.
Is CVE-2026-5497 actively exploited?
No confirmed active exploitation of CVE-2026-5497 has been reported, but organizations should still patch proactively.
How to fix CVE-2026-5497?
1. Patch: upgrade to a vLLM release containing commit 58ee61422169ce17e08248f8efa1e9df434fe395 or later. 2. Immediate workaround: block requests containing 'video/jpeg' data URLs at the reverse proxy or API gateway layer if multimodal video input is not required. 3. Payload limits: configure a strict maximum request body size (e.g., nginx client_max_body_size) to prevent oversized multimodal payloads from reaching the vLLM process. 4. Rate limiting: enforce per-IP and per-API-key request rate limits on the chat completions endpoint. 5. Network controls: ensure the vLLM API is never exposed to untrusted networks without an authenticated gateway in front; this vulnerability is zero-auth by default. 6. Detection: alert on OOM kills in vLLM service logs and unusually large incoming request payloads on multimodal endpoints.
What systems are affected by CVE-2026-5497?
This vulnerability affects the following AI/ML architecture patterns: LLM inference servers, OpenAI-compatible API deployments, Multimodal AI pipelines, Model serving infrastructure, Multi-tenant AI platforms.
What is the CVSS score for CVE-2026-5497?
No CVSS score has been assigned yet.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0029 Denial of AI Service AML.T0034.001 Resource-Intensive Queries AML.T0049 Exploit Public-Facing Application Compliance Controls Affected
What are the technical details?
Original Advisory
vLLM versions 0.8.0 and later are vulnerable to an Out-of-Memory (OOM) Denial of Service (DoS) attack due to unbounded frame count processing in the `VideoMediaIO.load_base64()` method. When processing `video/jpeg` data URLs, the method splits the base64 data string on commas to extract individual JPEG frames without enforcing a frame count limit. An attacker can exploit this by crafting a single API request containing thousands of comma-separated base64-encoded JPEG frames in a data URL, causing the server to decode all frames into memory and crash due to excessive memory consumption. This vulnerability is reachable via the OpenAI-compatible chat completions API and does not require authentication.
Exploitation Scenario
An attacker discovers a vLLM instance via port scanning or Shodan and confirms the OpenAI-compatible API is reachable. They craft a single HTTP POST to /v1/chat/completions with a user message containing a 'video/jpeg' data URL where the base64 field holds thousands of comma-separated base64-encoded JPEG strings. vLLM's VideoMediaIO.load_base64() splits the string on commas with no frame count limit and begins decoding each frame into memory simultaneously. The server's RAM is exhausted within seconds, the Python process is terminated by the OOM killer, and all in-flight inference requests are dropped. The attack requires no authentication, no API key, and no specialized knowledge—only a crafted HTTP request—and can be scripted and repeated to maintain persistent denial of service.
Weaknesses (CWE)
References
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm