### Summary A resource-exhaustion (denial-of-service) vulnerability exists in multiple endpoints of the OpenAI-Compatible Server due to the ability to specify Jinja templates via the `chat_template`...
Full analysis pending. Showing NVD description excerpt.
Affected Systems
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| vllm | pip | >= 0.5.1, < 0.11.0 | 0.11.0 |
Do you use vllm? You're affected.
Severity & Risk
Recommended Action
Patch available
Update vllm to version 0.11.0
Compliance Impact
Compliance analysis pending. Sign in for full compliance mapping when available.
Technical Details
NVD Description
### Summary A resource-exhaustion (denial-of-service) vulnerability exists in multiple endpoints of the OpenAI-Compatible Server due to the ability to specify Jinja templates via the `chat_template` and `chat_template_kwargs` parameters. If an attacker can supply these parameters to the API, they can cause a service outage by exhausting CPU and/or memory resources. ### Details When using an LLM as a chat model, the conversation history must be rendered into a text input for the model. In `hf/transformer`, this rendering is performed using a Jinja template. The OpenAI-Compatible Server launched by vllm serve exposes a `chat_template` parameter that lets users specify that template. In addition, the server accepts a `chat_template_kwargs` parameter to pass extra keyword arguments to the rendering function. Because Jinja templates support programming-language-like constructs (loops, nested iterations, etc.), a crafted template can consume extremely large amounts of CPU and memory and thereby trigger a denial-of-service condition. Importantly, simply forbidding the `chat_template` parameter does not fully mitigate the issue. The implementation constructs a dictionary of keyword arguments for `apply_hf_chat_template` and then updates that dictionary with the user-supplied `chat_template_kwargs` via `dict.update`. Since `dict.update` can overwrite existing keys, an attacker can place a `chat_template` key inside `chat_template_kwargs` to replace the template that will be used by `apply_hf_chat_template`. ```python # vllm/entrypoints/openai/serving_engine.py#L794-L816 _chat_template_kwargs: dict[str, Any] = dict( chat_template=chat_template, add_generation_prompt=add_generation_prompt, continue_final_message=continue_final_message, tools=tool_dicts, documents=documents, ) _chat_template_kwargs.update(chat_template_kwargs or {}) request_prompt: Union[str, list[int]] if isinstance(tokenizer, MistralTokenizer): ... else: request_prompt = apply_hf_chat_template( tokenizer=tokenizer, conversation=conversation, model_config=model_config, **_chat_template_kwargs, ) ``` ### Impact If an OpenAI-Compatible Server exposes endpoints that accept `chat_template` or `chat_template_kwargs` from untrusted clients, an attacker can submit a malicious Jinja template (directly or by overriding `chat_template` inside `chat_template_kwargs`) that consumes excessive CPU and/or memory. This can result in a resource-exhaustion denial-of-service that renders the server unresponsive to legitimate requests. ### Fixes * https://github.com/vllm-project/vllm/pull/25794
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H