CVE-2026-34755: vLLM: OOM DoS via unbounded video frame decoding

GHSA-pq5c-rjhq-qp7p MEDIUM
Published April 3, 2026
CISO Take

Any API consumer with a valid token can crash your vLLM inference server by sending a single multimodal request with thousands of base64-encoded JPEG frames, bypassing the built-in frame limit. Patch to vLLM 0.19.0 immediately; if you can't patch, block or rate-limit multimodal endpoints at the API gateway. This is trivially exploitable and availability impact is complete.

What is the risk?

Medium CVSS but operationally high-impact for teams running vLLM in production. The attack requires only low privileges—any authenticated API user can trigger it—and the payload is small (~100 KB compressed) yet decompresses to several gigabytes, creating a severe memory amplification ratio. No complex technique is required; the attacker just crafts a data URL with thousands of comma-separated frames. In multi-tenant or public-facing vLLM deployments the availability impact is critical since a single request can OOM-kill the process serving all tenants.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
vllm pip >= 0.7.0, < 0.19.0 0.19.0
80.2K 127 dependents Pushed 3d ago 56% patched ~33d to patch Full package profile →

Do you use vllm? You're affected.

Severity & Risk

CVSS 3.1
6.5 / 10
EPSS
0.1%
chance of exploitation in 30 days
Higher than 17% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Trivial

Attack Surface

AV AC PR UI S C I A
AV Network
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

What should I do?

5 steps
  1. PATCH

    Upgrade to vLLM 0.19.0 (PR #38636 adds frame count enforcement to the load_base64 video/jpeg path).

  2. WORKAROUND (if patching is delayed): At the API gateway or reverse proxy, enforce a maximum Content-Length on POST requests to /v1/chat/completions (e.g., 50 MB) and reject requests whose body contains data:video/jpeg with more than N commas (approximate frame count check via regex).

  3. RATE LIMIT

    Apply per-user/per-token rate limits on multimodal endpoints to reduce blast radius.

  4. MONITOR

    Alert on RSS memory spikes on vLLM worker processes, or on requests containing data:video/jpeg URLs with large payloads.

  5. NETWORK SEGMENTATION

    Ensure vLLM endpoints are not publicly exposed without authentication; require valid tokens even for internal use.

CISA SSVC Assessment

Decision Track
Exploitation none
Automatable No
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, Robustness and Cybersecurity
ISO 42001
8.4 - AI System Operation — Input data controls
NIST AI RMF
MANAGE 2.2 - Mechanisms to sustain AI system value and availability are established
OWASP LLM Top 10
LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2026-34755?

Any API consumer with a valid token can crash your vLLM inference server by sending a single multimodal request with thousands of base64-encoded JPEG frames, bypassing the built-in frame limit. Patch to vLLM 0.19.0 immediately; if you can't patch, block or rate-limit multimodal endpoints at the API gateway. This is trivially exploitable and availability impact is complete.

Is CVE-2026-34755 actively exploited?

No confirmed active exploitation of CVE-2026-34755 has been reported, but organizations should still patch proactively.

How to fix CVE-2026-34755?

1. PATCH: Upgrade to vLLM 0.19.0 (PR #38636 adds frame count enforcement to the load_base64 video/jpeg path). 2. WORKAROUND (if patching is delayed): At the API gateway or reverse proxy, enforce a maximum Content-Length on POST requests to /v1/chat/completions (e.g., 50 MB) and reject requests whose body contains data:video/jpeg with more than N commas (approximate frame count check via regex). 3. RATE LIMIT: Apply per-user/per-token rate limits on multimodal endpoints to reduce blast radius. 4. MONITOR: Alert on RSS memory spikes on vLLM worker processes, or on requests containing data:video/jpeg URLs with large payloads. 5. NETWORK SEGMENTATION: Ensure vLLM endpoints are not publicly exposed without authentication; require valid tokens even for internal use.

What systems are affected by CVE-2026-34755?

This vulnerability affects the following AI/ML architecture patterns: model serving, multimodal AI pipelines, agent frameworks, LLM inference API.

What is the CVSS score for CVE-2026-34755?

CVE-2026-34755 has a CVSS v3.1 base score of 6.5 (MEDIUM). The EPSS exploitation probability is 0.05%.

Technical Details

NVD Description

vLLM is an inference and serving engine for large language models (LLMs). From 0.7.0 to before 0.19.0, the VideoMediaIO.load_base64() method at vllm/multimodal/media/video.py splits video/jpeg data URLs by comma to extract individual JPEG frames, but does not enforce a frame count limit. The num_frames parameter (default: 32), which is enforced by the load_bytes() code path, is completely bypassed in the video/jpeg base64 path. An attacker can send a single API request containing thousands of comma-separated base64-encoded JPEG frames, causing the server to decode all frames into memory and crash with OOM. This vulnerability is fixed in 0.19.0.

Exploitation Scenario

An attacker with a valid API token (employee, contractor, or compromised credential) crafts a POST to /v1/chat/completions containing a messages array with a content item of type image_url whose url is a data:video/jpeg;base64,<frame1>,<frame2>,...,<frame5000> string. Each frame is a small but valid base64-encoded JPEG (~20 KB compressed). The vLLM server decodes all 5000 frames into numpy arrays (~921 KB each decoded for 640x480 RGB) and then np.stack() allocates a combined array, consuming ~4.6 GB of RAM plus a copy, crashing the server with OOM. The 5000-frame payload compresses to roughly 100 MB over the wire—well within typical request size limits. The server crashes, denying service to all other users. In Kubernetes deployments, the pod restarts automatically, but repeated requests keep it in a crash loop.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
April 3, 2026
Last Modified
April 7, 2026
First Seen
April 4, 2026

Related Vulnerabilities