Any API consumer with a valid token can crash your vLLM inference server by sending a single multimodal request with thousands of base64-encoded JPEG frames, bypassing the built-in frame limit. Patch to vLLM 0.19.0 immediately; if you can't patch, block or rate-limit multimodal endpoints at the API gateway. This is trivially exploitable and availability impact is complete.
What is the risk?
Medium CVSS but operationally high-impact for teams running vLLM in production. The attack requires only low privileges—any authenticated API user can trigger it—and the payload is small (~100 KB compressed) yet decompresses to several gigabytes, creating a severe memory amplification ratio. No complex technique is required; the attacker just crafts a data URL with thousands of comma-separated frames. In multi-tenant or public-facing vLLM deployments the availability impact is critical since a single request can OOM-kill the process serving all tenants.
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| vllm | pip | >= 0.7.0, < 0.19.0 | 0.19.0 |
Do you use vllm? You're affected.
Severity & Risk
Attack Surface
What should I do?
5 steps-
PATCH
Upgrade to vLLM 0.19.0 (PR #38636 adds frame count enforcement to the load_base64 video/jpeg path).
-
WORKAROUND (if patching is delayed): At the API gateway or reverse proxy, enforce a maximum Content-Length on POST requests to /v1/chat/completions (e.g., 50 MB) and reject requests whose body contains data:video/jpeg with more than N commas (approximate frame count check via regex).
-
RATE LIMIT
Apply per-user/per-token rate limits on multimodal endpoints to reduce blast radius.
-
MONITOR
Alert on RSS memory spikes on vLLM worker processes, or on requests containing data:video/jpeg URLs with large payloads.
-
NETWORK SEGMENTATION
Ensure vLLM endpoints are not publicly exposed without authentication; require valid tokens even for internal use.
CISA SSVC Assessment
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2026-34755?
Any API consumer with a valid token can crash your vLLM inference server by sending a single multimodal request with thousands of base64-encoded JPEG frames, bypassing the built-in frame limit. Patch to vLLM 0.19.0 immediately; if you can't patch, block or rate-limit multimodal endpoints at the API gateway. This is trivially exploitable and availability impact is complete.
Is CVE-2026-34755 actively exploited?
No confirmed active exploitation of CVE-2026-34755 has been reported, but organizations should still patch proactively.
How to fix CVE-2026-34755?
1. PATCH: Upgrade to vLLM 0.19.0 (PR #38636 adds frame count enforcement to the load_base64 video/jpeg path). 2. WORKAROUND (if patching is delayed): At the API gateway or reverse proxy, enforce a maximum Content-Length on POST requests to /v1/chat/completions (e.g., 50 MB) and reject requests whose body contains data:video/jpeg with more than N commas (approximate frame count check via regex). 3. RATE LIMIT: Apply per-user/per-token rate limits on multimodal endpoints to reduce blast radius. 4. MONITOR: Alert on RSS memory spikes on vLLM worker processes, or on requests containing data:video/jpeg URLs with large payloads. 5. NETWORK SEGMENTATION: Ensure vLLM endpoints are not publicly exposed without authentication; require valid tokens even for internal use.
What systems are affected by CVE-2026-34755?
This vulnerability affects the following AI/ML architecture patterns: model serving, multimodal AI pipelines, agent frameworks, LLM inference API.
What is the CVSS score for CVE-2026-34755?
CVE-2026-34755 has a CVSS v3.1 base score of 6.5 (MEDIUM). The EPSS exploitation probability is 0.05%.
Technical Details
NVD Description
vLLM is an inference and serving engine for large language models (LLMs). From 0.7.0 to before 0.19.0, the VideoMediaIO.load_base64() method at vllm/multimodal/media/video.py splits video/jpeg data URLs by comma to extract individual JPEG frames, but does not enforce a frame count limit. The num_frames parameter (default: 32), which is enforced by the load_bytes() code path, is completely bypassed in the video/jpeg base64 path. An attacker can send a single API request containing thousands of comma-separated base64-encoded JPEG frames, causing the server to decode all frames into memory and crash with OOM. This vulnerability is fixed in 0.19.0.
Exploitation Scenario
An attacker with a valid API token (employee, contractor, or compromised credential) crafts a POST to /v1/chat/completions containing a messages array with a content item of type image_url whose url is a data:video/jpeg;base64,<frame1>,<frame2>,...,<frame5000> string. Each frame is a small but valid base64-encoded JPEG (~20 KB compressed). The vLLM server decodes all 5000 frames into numpy arrays (~921 KB each decoded for 640x480 RGB) and then np.stack() allocates a combined array, consuming ~4.6 GB of RAM plus a copy, crashing the server with OOM. The 5000-frame payload compresses to roughly 100 MB over the wire—well within typical request size limits. The server crashes, denying service to all other users. In Kubernetes deployments, the pod restarts automatically, but repeated requests keep it in a crash loop.
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H References
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm