CVE-2026-5497: vLLM unauthenticated OOM DoS

CISO Take

vLLM 0.8.0+ contains an unauthenticated denial-of-service vulnerability reachable through its OpenAI-compatible chat completions API: a single crafted request embedding thousands of JPEG frames in a video data URL causes the inference server to exhaust memory and crash. This requires no credentials, no AI/ML expertise, and only one HTTP request, making it trivially weaponizable by any attacker with network access to your vLLM endpoint. vLLM is a de facto standard for production LLM serving—a successful attack takes down your entire inference layer, halting all AI-dependent workloads until the service is manually restarted. Apply the patch from commit 58ee614 immediately, enforce maximum payload size limits at the reverse proxy layer, and restrict multimodal API access to authenticated, trusted clients.

Sources: NVD GitHub Advisory ATLAS huntr.com

What is the risk?

HIGH operational risk for any organization running vLLM 0.8.0+ in production. The attack surface is the unauthenticated OpenAI-compatible endpoint—standard in most vLLM deployments. Exploitation requires zero credentials, no specialized tooling, and produces deterministic results: a service crash that must be manually remediated. While CVSS is not yet scored, the combination of no-authentication requirement, single-request exploit, and full availability impact on a critical AI infrastructure component justifies treating this as high severity. Default vLLM deployments with multimodal support enabled are fully exposed without any additional attacker effort.

How does the attack unfold?

Reconnaissance

Attacker identifies an exposed vLLM OpenAI-compatible API endpoint via port scanning or service discovery; no credentials are required to probe or confirm the service.

AML.T0006

Payload Crafting

Attacker constructs a chat completions request embedding a video/jpeg data URL with thousands of comma-separated base64 JPEG frames to maximize per-request memory allocation.

AML.T0034.001

Exploitation

A single unauthenticated POST to /v1/chat/completions triggers unbounded frame decoding in VideoMediaIO.load_base64(), exhausting available server RAM within seconds.

AML.T0049

Impact

The vLLM process is killed by the OOM killer, taking down the entire inference service and denying AI capabilities to all users until an operator manually restarts the service.

AML.T0029

Reconnaissance

Attacker identifies an exposed vLLM OpenAI-compatible API endpoint via port scanning or service discovery; no credentials are required to probe or confirm the service.

AML.T0006

Payload Crafting

Attacker constructs a chat completions request embedding a video/jpeg data URL with thousands of comma-separated base64 JPEG frames to maximize per-request memory allocation.

AML.T0034.001

Exploitation

A single unauthenticated POST to /v1/chat/completions triggers unbounded frame decoding in VideoMediaIO.load_base64(), exhausting available server RAM within seconds.

AML.T0049

Impact

The vLLM process is killed by the OOM killer, taking down the entire inference service and denying AI capabilities to all users until an operator manually restarts the service.

AML.T0029

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
vLLM	pip	—	No patch
82.1K 130 dependents Pushed 4d ago 54% patched ~32d to patch Full package profile →

Do you use vLLM? You're affected.

How severe is it?

CVSS 3.1

N/A

EPSS

N/A

Exploitation Status

No known exploitation

Sophistication

Trivial

What should I do?

6 steps

Patch: upgrade to a vLLM release containing commit 58ee61422169ce17e08248f8efa1e9df434fe395 or later.
Immediate workaround: block requests containing 'video/jpeg' data URLs at the reverse proxy or API gateway layer if multimodal video input is not required.
Payload limits: configure a strict maximum request body size (e.g., nginx client_max_body_size) to prevent oversized multimodal payloads from reaching the vLLM process.
Rate limiting: enforce per-IP and per-API-key request rate limits on the chat completions endpoint.
Network controls: ensure the vLLM API is never exposed to untrusted networks without an authenticated gateway in front; this vulnerability is zero-auth by default.
Detection: alert on OOM kills in vLLM service logs and unusually large incoming request payloads on multimodal endpoints.

How is it classified?

DoS Inference API AML.T0029 - Denial of AI Service AML.T0034.001 - Resource-Intensive Queries AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Article 9 - Risk Management System

ISO 42001

A.6.2.3 - AI system operation and monitoring

NIST AI RMF

MANAGE 2.4 - Residual risks to AI systems and users are monitored and managed

OWASP LLM Top 10

LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2026-5497?

vLLM 0.8.0+ contains an unauthenticated denial-of-service vulnerability reachable through its OpenAI-compatible chat completions API: a single crafted request embedding thousands of JPEG frames in a video data URL causes the inference server to exhaust memory and crash. This requires no credentials, no AI/ML expertise, and only one HTTP request, making it trivially weaponizable by any attacker with network access to your vLLM endpoint. vLLM is a de facto standard for production LLM serving—a successful attack takes down your entire inference layer, halting all AI-dependent workloads until the service is manually restarted. Apply the patch from commit 58ee614 immediately, enforce maximum payload size limits at the reverse proxy layer, and restrict multimodal API access to authenticated, trusted clients.

Is CVE-2026-5497 actively exploited?

No confirmed active exploitation of CVE-2026-5497 has been reported, but organizations should still patch proactively.

How to fix CVE-2026-5497?

1. Patch: upgrade to a vLLM release containing commit 58ee61422169ce17e08248f8efa1e9df434fe395 or later. 2. Immediate workaround: block requests containing 'video/jpeg' data URLs at the reverse proxy or API gateway layer if multimodal video input is not required. 3. Payload limits: configure a strict maximum request body size (e.g., nginx client_max_body_size) to prevent oversized multimodal payloads from reaching the vLLM process. 4. Rate limiting: enforce per-IP and per-API-key request rate limits on the chat completions endpoint. 5. Network controls: ensure the vLLM API is never exposed to untrusted networks without an authenticated gateway in front; this vulnerability is zero-auth by default. 6. Detection: alert on OOM kills in vLLM service logs and unusually large incoming request payloads on multimodal endpoints.

What systems are affected by CVE-2026-5497?

This vulnerability affects the following AI/ML architecture patterns: LLM inference servers, OpenAI-compatible API deployments, Multimodal AI pipelines, Model serving infrastructure, Multi-tenant AI platforms.

What is the CVSS score for CVE-2026-5497?

No CVSS score has been assigned yet.

What is the AI security impact?

Affected AI Architectures

LLM inference serversOpenAI-compatible API deploymentsMultimodal AI pipelinesModel serving infrastructureMulti-tenant AI platforms

MITRE ATLAS Techniques

AML.T0029 Denial of AI Service

AML.T0034.001 Resource-Intensive Queries

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 9

ISO 42001: A.6.2.3

NIST AI RMF: MANAGE 2.4

OWASP LLM Top 10: LLM04

What are the technical details?

Original Advisory

vLLM versions 0.8.0 and later are vulnerable to an Out-of-Memory (OOM) Denial of Service (DoS) attack due to unbounded frame count processing in the `VideoMediaIO.load_base64()` method. When processing `video/jpeg` data URLs, the method splits the base64 data string on commas to extract individual JPEG frames without enforcing a frame count limit. An attacker can exploit this by crafting a single API request containing thousands of comma-separated base64-encoded JPEG frames in a data URL, causing the server to decode all frames into memory and crash due to excessive memory consumption. This vulnerability is reachable via the OpenAI-compatible chat completions API and does not require authentication.

Exploitation Scenario

An attacker discovers a vLLM instance via port scanning or Shodan and confirms the OpenAI-compatible API is reachable. They craft a single HTTP POST to /v1/chat/completions with a user message containing a 'video/jpeg' data URL where the base64 field holds thousands of comma-separated base64-encoded JPEG strings. vLLM's VideoMediaIO.load_base64() splits the string on commas with no frame count limit and begins decoding each frame into memory simultaneously. The server's RAM is exhausted within seconds, the Python process is terminated by the OOM killer, and all in-flight inference requests are dropped. The attack requires no authentication, no API key, and no specialized knowledge—only a crafted HTTP request—and can be scripted and repeated to maintain persistent denial of service.