Any API consumer with a valid token can crash your vLLM inference server by sending a single multimodal request with thousands of base64-encoded JPEG frames, bypassing the built-in frame limit. Patch to vLLM 0.19.0 immediately; if you can't patch, block or rate-limit multimodal endpoints at the API gateway. This is trivially exploitable and availability impact is complete.
What is the risk?
Medium CVSS but operationally high-impact for teams running vLLM in production. The attack requires only low privileges—any authenticated API user can trigger it—and the payload is small (~100 KB compressed) yet decompresses to several gigabytes, creating a severe memory amplification ratio. No complex technique is required; the attacker just crafts a data URL with thousands of comma-separated frames. In multi-tenant or public-facing vLLM deployments the availability impact is critical since a single request can OOM-kill the process serving all tenants.
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| vLLM | pip | >= 0.7.0, < 0.19.0 | 0.19.0 |
Do you use vLLM? You're affected.
How severe is it?
What is the attack surface?
What should I do?
5 steps-
PATCH
Upgrade to vLLM 0.19.0 (PR #38636 adds frame count enforcement to the load_base64 video/jpeg path).
-
WORKAROUND (if patching is delayed): At the API gateway or reverse proxy, enforce a maximum Content-Length on POST requests to /v1/chat/completions (e.g., 50 MB) and reject requests whose body contains data:video/jpeg with more than N commas (approximate frame count check via regex).
-
RATE LIMIT
Apply per-user/per-token rate limits on multimodal endpoints to reduce blast radius.
-
MONITOR
Alert on RSS memory spikes on vLLM worker processes, or on requests containing data:video/jpeg URLs with large payloads.
-
NETWORK SEGMENTATION
Ensure vLLM endpoints are not publicly exposed without authentication; require valid tokens even for internal use.
What does CISA's SSVC say?
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2026-34755?
Any API consumer with a valid token can crash your vLLM inference server by sending a single multimodal request with thousands of base64-encoded JPEG frames, bypassing the built-in frame limit. Patch to vLLM 0.19.0 immediately; if you can't patch, block or rate-limit multimodal endpoints at the API gateway. This is trivially exploitable and availability impact is complete.
Is CVE-2026-34755 actively exploited?
No confirmed active exploitation of CVE-2026-34755 has been reported, but organizations should still patch proactively.
How to fix CVE-2026-34755?
1. PATCH: Upgrade to vLLM 0.19.0 (PR #38636 adds frame count enforcement to the load_base64 video/jpeg path). 2. WORKAROUND (if patching is delayed): At the API gateway or reverse proxy, enforce a maximum Content-Length on POST requests to /v1/chat/completions (e.g., 50 MB) and reject requests whose body contains data:video/jpeg with more than N commas (approximate frame count check via regex). 3. RATE LIMIT: Apply per-user/per-token rate limits on multimodal endpoints to reduce blast radius. 4. MONITOR: Alert on RSS memory spikes on vLLM worker processes, or on requests containing data:video/jpeg URLs with large payloads. 5. NETWORK SEGMENTATION: Ensure vLLM endpoints are not publicly exposed without authentication; require valid tokens even for internal use.
What systems are affected by CVE-2026-34755?
This vulnerability affects the following AI/ML architecture patterns: model serving, multimodal AI pipelines, agent frameworks, LLM inference API.
What is the CVSS score for CVE-2026-34755?
CVE-2026-34755 has a CVSS v3.1 base score of 6.5 (MEDIUM). The EPSS exploitation probability is 0.35%.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0029 Denial of AI Service AML.T0034 Cost Harvesting AML.T0049 Exploit Public-Facing Application Compliance Controls Affected
What are the technical details?
Original Advisory
vLLM is an inference and serving engine for large language models (LLMs). From 0.7.0 to before 0.19.0, the VideoMediaIO.load_base64() method at vllm/multimodal/media/video.py splits video/jpeg data URLs by comma to extract individual JPEG frames, but does not enforce a frame count limit. The num_frames parameter (default: 32), which is enforced by the load_bytes() code path, is completely bypassed in the video/jpeg base64 path. An attacker can send a single API request containing thousands of comma-separated base64-encoded JPEG frames, causing the server to decode all frames into memory and crash with OOM. This vulnerability is fixed in 0.19.0.
Exploitation Scenario
An attacker with a valid API token (employee, contractor, or compromised credential) crafts a POST to /v1/chat/completions containing a messages array with a content item of type image_url whose url is a data:video/jpeg;base64,<frame1>,<frame2>,...,<frame5000> string. Each frame is a small but valid base64-encoded JPEG (~20 KB compressed). The vLLM server decodes all 5000 frames into numpy arrays (~921 KB each decoded for 640x480 RGB) and then np.stack() allocates a combined array, consuming ~4.6 GB of RAM plus a copy, crashing the server with OOM. The 5000-frame payload compresses to roughly 100 MB over the wire—well within typical request size limits. The server crashes, denying service to all other users. In Kubernetes deployments, the pod restarts automatically, but repeated requests keep it in a crash loop.
Weaknesses (CWE)
CWE-770 Allocation of Resources Without Limits or Throttling
Primary
CWE-770 Allocation of Resources Without Limits or Throttling
Primary
CWE-770 Allocation of Resources Without Limits or Throttling CWE-770 — Allocation of Resources Without Limits or Throttling: The product allocates a reusable resource or group of resources on behalf of an actor without imposing any intended restrictions on the size or number of resources that can be allocated.
- [Requirements] Clearly specify the minimum and maximum expectations for capabilities, and dictate which behaviors are acceptable when resource allocation reaches limits.
- [Architecture and Design] Limit the amount of resources that are accessible to unprivileged users. Set per-user limits for resources. Allow the system administrator to define these limits. Be careful to avoid CWE-410.
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H References
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm