Any vLLM or xgrammar-powered inference endpoint accepting user-supplied JSON schemas is vulnerable to memory exhaustion DoS — no authentication required beyond a valid user session (CVSS PR:L). Patch to xgrammar 0.1.18 immediately; if patching is delayed, rate-limit structured-output requests and cap unique schema submissions per session. This is a low-sophistication attack: a script sending thousands of unique schemas can take down an inference node.
Risk Assessment
Medium severity in isolation, but operationally significant for AI inference infrastructure. The attack surface is broad — vLLM is widely deployed in enterprise LLM serving stacks and the exploit requires only low-privilege API access. EPSS is low (0.003) suggesting no active exploitation yet, and it is not in CISA KEV. However, the simplicity of the attack (no special knowledge needed, just unique JSON schemas) and the high availability impact on inference nodes elevate operational risk above the 6.5 CVSS score suggests.
Affected Systems
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| xgrammar | pip | < 0.1.18 | 0.1.18 |
Do you use xgrammar? You're affected.
Severity & Risk
Attack Surface
Recommended Action
5 steps-
PATCH
Upgrade xgrammar to >= 0.1.18 (cache size limit introduced). Update vLLM to a version referencing xgrammar 0.1.18+ (see vLLM PR #16283).
-
SHORT-TERM WORKAROUND: Rate-limit structured-output (JSON schema) requests per client/session at the API gateway or load balancer layer. Restrict unique schema submissions to a reasonable bound (e.g., 50/hour per API key).
-
MONITORING
Alert on memory growth patterns on inference nodes, particularly correlated with structured-output endpoint traffic. Set OOM kill alerts.
-
NETWORK CONTROLS
Ensure inference endpoints are not publicly exposed without authentication; apply the principle of least privilege to schema-submission capabilities.
-
VERIFY
Confirm your vLLM deployment version and run
pip show xgrammarto check the installed version.
CISA SSVC Assessment
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2025-32381?
Any vLLM or xgrammar-powered inference endpoint accepting user-supplied JSON schemas is vulnerable to memory exhaustion DoS — no authentication required beyond a valid user session (CVSS PR:L). Patch to xgrammar 0.1.18 immediately; if patching is delayed, rate-limit structured-output requests and cap unique schema submissions per session. This is a low-sophistication attack: a script sending thousands of unique schemas can take down an inference node.
Is CVE-2025-32381 actively exploited?
No confirmed active exploitation of CVE-2025-32381 has been reported, but organizations should still patch proactively.
How to fix CVE-2025-32381?
1. PATCH: Upgrade xgrammar to >= 0.1.18 (cache size limit introduced). Update vLLM to a version referencing xgrammar 0.1.18+ (see vLLM PR #16283). 2. SHORT-TERM WORKAROUND: Rate-limit structured-output (JSON schema) requests per client/session at the API gateway or load balancer layer. Restrict unique schema submissions to a reasonable bound (e.g., 50/hour per API key). 3. MONITORING: Alert on memory growth patterns on inference nodes, particularly correlated with structured-output endpoint traffic. Set OOM kill alerts. 4. NETWORK CONTROLS: Ensure inference endpoints are not publicly exposed without authentication; apply the principle of least privilege to schema-submission capabilities. 5. VERIFY: Confirm your vLLM deployment version and run `pip show xgrammar` to check the installed version.
What systems are affected by CVE-2025-32381?
This vulnerability affects the following AI/ML architecture patterns: LLM inference servers, structured output pipelines, model serving, agent frameworks.
What is the CVSS score for CVE-2025-32381?
CVE-2025-32381 has a CVSS v3.1 base score of 6.5 (MEDIUM). The EPSS exploitation probability is 0.32%.
Technical Details
NVD Description
### Summary Xgrammar includes a cache for compiled grammars to increase performance with repeated use of the same grammar. This cache is held in memory. Since the cache is unbounded, a system making use of xgrammar can be abused to fill up a host's memory and case a denial of service. For example, sending many small requests to an LLM inference server with unique JSON schemas would eventually cause this denial of service to occur. ### Details The fix is to add a limit to the cache size. This was done in https://github.com/mlc-ai/xgrammar/pull/243 An example of making use of the new cache size limit can be found in vLLM here: https://github.com/vllm-project/vllm/pull/16283 ### Impact Any system making use of Xgrammar and taking requests as input from potentially untrusted parties would be vulnerable to this denial of service issue.
Exploitation Scenario
An adversary with low-privilege API access to a vLLM inference endpoint (e.g., a free-tier or trial user) writes a script generating thousands of structurally unique JSON schemas — each schema with slightly different property names or nesting. Each request to the `/v1/chat/completions` endpoint with a unique `response_format.json_schema` triggers xgrammar to compile and cache a new grammar object. With no eviction policy, the cache grows unbounded. After ~10,000-50,000 requests (depending on schema complexity and host RAM), the host's memory is exhausted, the inference process is OOM-killed, and the endpoint becomes unavailable for all users. The attack is fully automatable, requires no special AI/ML knowledge, and can be executed from a single low-bandwidth connection.
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H References
Timeline
Related Vulnerabilities
CVE-2025-57809 7.5 xgrammar: uncontrolled recursion in grammar parsing causes DoS
Same package: xgrammar CVE-2025-58446 xgrammar: DoS via oversized JSON schema grammar parsing
Same package: xgrammar CVE-2026-25048 xgrammar: security flaw enables exploitation
Same package: xgrammar CVE-2026-33660 10.0 TensorFlow: type confusion NPD in tensor conversion
Same attack type: DoS CVE-2022-35939 9.8 TensorFlow: ScatterNd OOB write enables RCE/crash
Same attack type: DoS
AI Threat Alert