CVE-2025-29770: vLLM: DoS via unbounded grammar cache exhausts disk
GHSA-mgrm-fgjv-mhv8 MEDIUM PoC AVAILABLEAny authenticated user of your vLLM inference API can crash it by flooding structured output requests with unique schemas, filling the host filesystem. Upgrade to vLLM 0.8.0 immediately — the fix is available and the attack is trivial to execute. If you cannot patch now, restrict per-request backend selection and apply filesystem quotas to contain the blast radius.
Risk Assessment
Medium CVSS but practically significant for production deployments. The attack requires only low-privilege API access and is technically trivial — a simple loop with randomized JSON schemas suffices. The per-request override of the guided_decoding_backend parameter makes default-configuration mitigations ineffective without patching. Risk is elevated for multi-tenant or externally accessible vLLM deployments. Low EPSS and absence from CISA KEV suggest no active exploitation yet, but the simplicity of the exploit warrants prompt action.
Affected Systems
Severity & Risk
Attack Surface
Recommended Action
6 steps-
PATCH
Upgrade vLLM to >= 0.8.0 — the root fix is available.
-
WORKAROUND (if patching is delayed): Block the guided_decoding_backend key in extra_body via an API gateway or middleware layer; disable outlines backend entirely if structured output is not required.
-
RATE-LIMIT: Apply per-user/per-IP rate limits on the /v1/chat/completions endpoint.
-
ISOLATE
Run vLLM in a container with a dedicated filesystem or enforced disk quota to limit blast radius.
-
DETECT
Alert on sudden disk growth in the outlines grammar cache directory (typically ~/.cache/outlines/); baseline normal cache size and alert on anomalous growth.
-
AUDIT
Review API access logs for users submitting high volumes of structured output requests with unique schemas.
CISA SSVC Assessment
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2025-29770?
Any authenticated user of your vLLM inference API can crash it by flooding structured output requests with unique schemas, filling the host filesystem. Upgrade to vLLM 0.8.0 immediately — the fix is available and the attack is trivial to execute. If you cannot patch now, restrict per-request backend selection and apply filesystem quotas to contain the blast radius.
Is CVE-2025-29770 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2025-29770, increasing the risk of exploitation.
How to fix CVE-2025-29770?
1. PATCH: Upgrade vLLM to >= 0.8.0 — the root fix is available. 2. WORKAROUND (if patching is delayed): Block the guided_decoding_backend key in extra_body via an API gateway or middleware layer; disable outlines backend entirely if structured output is not required. 3. RATE-LIMIT: Apply per-user/per-IP rate limits on the /v1/chat/completions endpoint. 4. ISOLATE: Run vLLM in a container with a dedicated filesystem or enforced disk quota to limit blast radius. 5. DETECT: Alert on sudden disk growth in the outlines grammar cache directory (typically ~/.cache/outlines/); baseline normal cache size and alert on anomalous growth. 6. AUDIT: Review API access logs for users submitting high volumes of structured output requests with unique schemas.
What systems are affected by CVE-2025-29770?
This vulnerability affects the following AI/ML architecture patterns: model serving, LLM inference APIs, agent frameworks, RAG pipelines.
What is the CVSS score for CVE-2025-29770?
CVE-2025-29770 has a CVSS v3.1 base score of 6.5 (MEDIUM). The EPSS exploitation probability is 0.66%.
Technical Details
NVD Description
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. The outlines library is one of the backends used by vLLM to support structured output (a.k.a. guided decoding). Outlines provides an optional cache for its compiled grammars on the local filesystem. This cache has been on by default in vLLM. Outlines is also available by default through the OpenAI compatible API server. The affected code in vLLM is vllm/model_executor/guided_decoding/outlines_logits_processors.py, which unconditionally uses the cache from outlines. A malicious user can send a stream of very short decoding requests with unique schemas, resulting in an addition to the cache for each request. This can result in a Denial of Service if the filesystem runs out of space. Note that even if vLLM was configured to use a different backend by default, it is still possible to choose outlines on a per-request basis using the guided_decoding_backend key of the extra_body field of the request. This issue applies only to the V0 engine and is fixed in 0.8.0.
Exploitation Scenario
An attacker with any valid API credential — including a free trial account — scripts a loop sending POST requests to the vLLM /v1/chat/completions endpoint. Each request includes a unique JSON schema in response_format and sets guided_decoding_backend=outlines in extra_body. vLLM compiles each schema and writes it to the local filesystem cache. After thousands of requests (automatable in minutes), the host disk fills up, the vLLM process crashes, and all inference is unavailable to legitimate users. No AI/ML expertise is required; the attack vector is identical to classic resource exhaustion attacks adapted for LLM inference infrastructure.
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H References
- github.com/advisories/GHSA-mgrm-fgjv-mhv8
- nvd.nist.gov/vuln/detail/CVE-2025-29770
- github.com/vllm-project/vllm/blob/53be4a863486d02bd96a59c674bbec23eec508f6/vllm/model_executor/guided_decoding/outlines_logits_processors.py Product
- github.com/vllm-project/vllm/pull/14837 Issue Patch
- github.com/vllm-project/vllm/security/advisories/GHSA-mgrm-fgjv-mhv8 Patch Vendor
- github.com/fkie-cad/nvd-json-data-feeds Exploit
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm
AI Threat Alert