Any vLLM deployment exposing the OpenAI-compatible API to untrusted users is vulnerable to RAM exhaustion through crafted structured-output requests. Upgrade to vLLM 0.8.4 immediately; if patching is blocked, gate API access to authenticated, trusted clients only. This is low-effort to exploit and high-impact on availability of your AI inference infrastructure.
Risk Assessment
CVSS 6.5 (medium) understates operational risk for production inference servers. The attack requires only a low-privilege API account and no special AI knowledge — any authenticated user can trigger it by sending a stream of structured-output requests with unique JSON schemas. Availability impact is HIGH: successful exploitation exhausts all system RAM, crashing the inference server. For multi-tenant or internally shared vLLM deployments, one malicious insider or compromised account can take down AI services for all users.
Affected Systems
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| vllm | pip | >= 0.6.5, < 0.8.4 | 0.8.4 |
Do you use vllm? You're affected.
Severity & Risk
Attack Surface
Recommended Action
5 steps-
Patch
Upgrade vLLM to >= 0.8.4 — this is the only complete fix.
-
Workaround (if patching is blocked)
Restrict the OpenAI-compatible API to trusted, authenticated clients only; block or rate-limit external access.
-
Detection
Monitor RAM consumption on inference nodes for sustained growth correlated with structured-output requests; alert on memory usage > 80% sustained over 5 minutes.
-
V0 engine hardening
If you cannot upgrade, consider disabling the per-request guided_decoding_backend override or blocking the extra_body.guided_decoding_backend parameter at your API gateway.
-
Inventory
Audit which internal services call vLLM's structured output endpoints and their trust level.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is GHSA-hf3c-wxg2-49q9?
Any vLLM deployment exposing the OpenAI-compatible API to untrusted users is vulnerable to RAM exhaustion through crafted structured-output requests. Upgrade to vLLM 0.8.4 immediately; if patching is blocked, gate API access to authenticated, trusted clients only. This is low-effort to exploit and high-impact on availability of your AI inference infrastructure.
Is GHSA-hf3c-wxg2-49q9 actively exploited?
No confirmed active exploitation of GHSA-hf3c-wxg2-49q9 has been reported, but organizations should still patch proactively.
How to fix GHSA-hf3c-wxg2-49q9?
1. **Patch**: Upgrade vLLM to >= 0.8.4 — this is the only complete fix. 2. **Workaround (if patching is blocked)**: Restrict the OpenAI-compatible API to trusted, authenticated clients only; block or rate-limit external access. 3. **Detection**: Monitor RAM consumption on inference nodes for sustained growth correlated with structured-output requests; alert on memory usage > 80% sustained over 5 minutes. 4. **V0 engine hardening**: If you cannot upgrade, consider disabling the per-request guided_decoding_backend override or blocking the extra_body.guided_decoding_backend parameter at your API gateway. 5. **Inventory**: Audit which internal services call vLLM's structured output endpoints and their trust level.
What systems are affected by GHSA-hf3c-wxg2-49q9?
This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, OpenAI-compatible API servers, Model serving, Agent frameworks, RAG pipelines.
What is the CVSS score for GHSA-hf3c-wxg2-49q9?
GHSA-hf3c-wxg2-49q9 has a CVSS v3.1 base score of 6.5 (MEDIUM).
Technical Details
NVD Description
### Impact This report is to highlight a vulnerability in XGrammar, a library used by the structured output feature in vLLM. The XGrammar advisory is here: https://github.com/mlc-ai/xgrammar/security/advisories/GHSA-389x-67px-mjg3 The [xgrammar](https://xgrammar.mlc.ai/docs/) library is the default backend used by vLLM to support structured output (a.k.a. guided decoding). Xgrammar provides a required, built-in cache for its compiled grammars stored in RAM. xgrammar is available by default through the OpenAI compatible API server with both the V0 and V1 engines. A malicious user can send a stream of very short decoding requests with unique schemas, resulting in an addition to the cache for each request. This can result in a Denial of Service by consuming all of the system's RAM. Note that even if vLLM was configured to use a different backend by default, it is still possible to choose xgrammar on a per-request basis using the `guided_decoding_backend` key of the `extra_body` field of the request with the V0 engine. This per-request choice is not available when using the V1 engine. ### Patches * https://github.com/vllm-project/vllm/pull/16283 ### Workarounds There is no way to workaround this issue in existing versions of vLLM other than preventing untrusted access to the OpenAI compatible API server. ### References * https://github.com/mlc-ai/xgrammar/security/advisories/GHSA-389x-67px-mjg3
Exploitation Scenario
An attacker with a valid API key (insider threat, stolen credential, or paying trial user) writes a script that sends hundreds of /v1/chat/completions requests per minute, each specifying a unique JSON schema in the response_format field. vLLM's XGrammar backend compiles and caches a grammar object for each unique schema in RAM with no eviction policy. Within minutes, the inference server's available memory is exhausted, causing the process to OOM-crash or the OS to kill it, resulting in a complete outage of AI inference capabilities. The attacker needs no ML expertise — only knowledge of the OpenAI structured output API format, which is publicly documented.
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H References
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm
AI Threat Alert