CVE-2026-44222: vLLM: token injection DoS via multimodal placeholders

GHSA-hpv8-x276-m59f MEDIUM
Published May 5, 2026
CISO Take

CVE-2026-44222 is a token injection vulnerability in vLLM that allows any caller with API access to crash inference workers by embedding vision control tokens—such as <|vision_start|><|image_pad|><|vision_end|>—in a plain-text request with no actual image or video payload. When vLLM detects these placeholder tokens it attempts to index into grid dimension tensors that are empty, triggering an unhandled IndexError that terminates the GPU worker and reduces serving capacity until a manual restart. With 126 downstream dependents and a published proof-of-concept targeting vLLM 0.10.0 with Qwen2.5-VL, any organization running vision-language models via vLLM's OpenAI-compatible endpoint is exposed—whether self-hosted or behind a proxy that forwards raw user text. The CVE is not in CISA KEV and has no automated scanner template, but the exploit requires no specialized tooling beyond a single curl command with a crafted JSON body. Upgrade to vLLM 0.20.0 immediately; as a stopgap, sanitize or reject requests containing vision control tokens in text-only payloads and monitor for HTTP 500 spikes originating from rotary embedding position computation.

Sources: GitHub Advisory NVD ATLAS

What is the risk?

CVSS 6.5 Medium understates operational risk for AI serving infrastructure. Low attack complexity and a fully public PoC mean exploitation requires no specialized knowledge—a single HTTP request crashes a GPU worker. In production vLLM deployments behind load balancers without automatic worker respawning, sustained attacks can exhaust worker pools and cause a full service outage. The 42 prior CVEs in this package and a risk score of 61/100 indicate a pattern of security debt in the codebase. Risk is HIGH for any organization serving vision-language models through vLLM with publicly or semi-publicly reachable endpoints.

How does the attack unfold?

API Access
Adversary connects to the vLLM OpenAI-compatible HTTP endpoint (/v1/chat/completions), which may be exposed without authentication or with minimal low-privilege credentials.
AML.T0040
Token Injection
Adversary sends a text-only chat message embedding vision control tokens (<|vision_start|><|image_pad|><|vision_end|>) with no accompanying image or video data in the request payload.
AML.T0051.000
Exploitation
vLLM's multimodal position computation detects the vision tokens, attempts to index into the empty image_grid_thw tensor, and raises an unhandled IndexError in _vl_get_input_positions_tensor.
AML.T0049
Worker Crash
The GPU worker process terminates, the service returns HTTP 500, and inference capacity is reduced until a human operator manually restarts the process—repeating the request sustains the outage.
AML.T0029

What systems are affected?

Package Ecosystem Vulnerable Range Patched
vLLM pip >= 0.6.1, < 0.20.0 0.20.0
83.4K 130 dependents Pushed 2d ago 34% patched ~32d to patch Full package profile →

Do you use vLLM? You're affected.

How severe is it?

CVSS 3.1
6.5 / 10
EPSS
0.4%
chance of exploitation in 30 days
Higher than 33% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Trivial

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

What should I do?

5 steps
  1. Patch immediately: upgrade vLLM to >= 0.20.0.

  2. If patching is not immediately possible, deploy an input validation layer that strips or rejects requests containing vision control tokens (<|vision_start|>, <|image_pad|>, <|vision_end|>, <|video_pad|>) in text-only messages before they reach the inference backend.

  3. Detection: alert on HTTP 500 responses from /v1/chat/completions correlated with IndexError in vllm.model_executor.layers.rotary_embedding logs.

  4. Enforce authentication and rate limiting on all vLLM API endpoints to reduce unauthenticated exposure.

  5. Monitor GPU worker restart frequency as a leading indicator of active exploitation attempts.

What does CISA's SSVC say?

Decision Track
Exploitation none
Automatable No
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Art. 9 - Risk management system
ISO 42001
A.6.2 - AI risk assessment A.9.1 - Monitoring of AI systems
NIST AI RMF
GOVERN-6.2 - Policies and practices are in place for AI risk management MANAGE-2.4 - Residual risks are monitored and managed
OWASP LLM Top 10
LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2026-44222?

CVE-2026-44222 is a token injection vulnerability in vLLM that allows any caller with API access to crash inference workers by embedding vision control tokens—such as <|vision_start|><|image_pad|><|vision_end|>—in a plain-text request with no actual image or video payload. When vLLM detects these placeholder tokens it attempts to index into grid dimension tensors that are empty, triggering an unhandled IndexError that terminates the GPU worker and reduces serving capacity until a manual restart. With 126 downstream dependents and a published proof-of-concept targeting vLLM 0.10.0 with Qwen2.5-VL, any organization running vision-language models via vLLM's OpenAI-compatible endpoint is exposed—whether self-hosted or behind a proxy that forwards raw user text. The CVE is not in CISA KEV and has no automated scanner template, but the exploit requires no specialized tooling beyond a single curl command with a crafted JSON body. Upgrade to vLLM 0.20.0 immediately; as a stopgap, sanitize or reject requests containing vision control tokens in text-only payloads and monitor for HTTP 500 spikes originating from rotary embedding position computation.

Is CVE-2026-44222 actively exploited?

No confirmed active exploitation of CVE-2026-44222 has been reported, but organizations should still patch proactively.

How to fix CVE-2026-44222?

1. Patch immediately: upgrade vLLM to >= 0.20.0. 2. If patching is not immediately possible, deploy an input validation layer that strips or rejects requests containing vision control tokens (<|vision_start|>, <|image_pad|>, <|vision_end|>, <|video_pad|>) in text-only messages before they reach the inference backend. 3. Detection: alert on HTTP 500 responses from /v1/chat/completions correlated with IndexError in vllm.model_executor.layers.rotary_embedding logs. 4. Enforce authentication and rate limiting on all vLLM API endpoints to reduce unauthenticated exposure. 5. Monitor GPU worker restart frequency as a leading indicator of active exploitation attempts.

What systems are affected by CVE-2026-44222?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, multimodal AI systems, vision-language model deployments, AI API gateways, model serving.

What is the CVSS score for CVE-2026-44222?

CVE-2026-44222 has a CVSS v3.1 base score of 6.5 (MEDIUM). The EPSS exploitation probability is 0.41%.

What is the AI security impact?

Affected AI Architectures

LLM inference servingmultimodal AI systemsvision-language model deploymentsAI API gatewaysmodel serving

MITRE ATLAS Techniques

AML.T0029 Denial of AI Service
AML.T0040 AI Model Inference API Access
AML.T0043.003 Manual Modification
AML.T0049 Exploit Public-Facing Application
AML.T0051.000 Direct

Compliance Controls Affected

EU AI Act: Art. 9
ISO 42001: A.6.2, A.9.1
NIST AI RMF: GOVERN-6.2, MANAGE-2.4
OWASP LLM Top 10: LLM04

What are the technical details?

Original Advisory

vLLM is an inference and serving engine for large language models (LLMs). From 0.6.1 to before 0.20.0, there is a a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and video placeholder sequences supplied without matching data cause vLLM to index into empty grids during input-position computation, raising an unhandled IndexError and terminating the worker or degrading availability. Multimodal paths that rely on image_grid_thw/video_grid_thw are affected. This vulnerability is fixed in 0.20.0.

Exploitation Scenario

An adversary targeting a chatbot or AI copilot backed by vLLM with a vision-language model sends a standard OpenAI-compatible chat completion request. The message content is text-only but embeds vision placeholder tokens—<|vision_start|><|image_pad|><|vision_end|>—inside the user message string; no image attachment is provided. vLLM's multimodal router detects the vision tokens, increments the internal vision token counter, and then attempts to retrieve grid dimensions from the empty image_grid_thw tensor, triggering IndexError: list index out of range in _vl_get_input_positions_tensor. The GPU worker process exits, the service returns HTTP 500, and capacity is reduced. Repeating the request at low frequency prevents recovery, constituting a sustained DoS with minimal attacker resources and no footprint beyond HTTP logs.

Weaknesses (CWE)

CWE-129 — Improper Validation of Array Index: The product uses untrusted input when calculating or using an array index, but the product does not validate or incorrectly validates the index to ensure the index references a valid position within the array.

  • [Architecture and Design] Use an input validation framework such as Struts or the OWASP ESAPI Validation API. Note that using a framework does not automatically address all input validation problems; be mindful of weaknesses that could arise from misusing the framework itself (CWE-1173).
  • [Architecture and Design] For any security checks that are performed on the client side, ensure that these checks are duplicated on the server side, in order to avoid CWE-602. Attackers can bypass the client-side checks by modifying values after the checks have been performed, or by changing the client to remove the client-side checks entirely. Then, these modified values would be submitted to the server. Even though client-side checks provide minimal benefits with respect to server-side security, they are still useful. First, they can support intrusion detection. If the server receives input that should have been rejected by the client, then it may be an indication of an attack. Second, client-side error-checking can provide helpful feedback to the user about the expectations for valid input. Third, there may be a reduction in server-side processing time for accidental input errors, although this is typically a small savings.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
May 5, 2026
Last Modified
May 13, 2026
First Seen
May 6, 2026

Related Vulnerabilities