CVE-2025-3225: llama-index Papers Loader: XML expansion DoS

GHSA-w42r-mrx7-c633 HIGH CISA: TRACK*
Published July 7, 2025
CISO Take

Any RAG or document ingestion pipeline using llama-index-readers-papers to process sitemaps is vulnerable to a billion-laughs DoS that can crash the service via memory exhaustion. The fix is available: upgrade to llama-index-readers-papers >= 0.3.2 (llama-index >= 0.12.29) now. No exploitation observed in the wild yet, but the attack is trivial to craft.

What is the risk?

CVSS 7.5 High but real-world risk is moderate. EPSS 0.00144 indicates minimal active exploitation. The attack vector is network-accessible with no authentication required and no user interaction, making it a zero-friction DoS if the parser is exposed to untrusted input. The impact is purely availability — no data exposure or code execution. Risk elevates significantly for teams running automated document ingestion pipelines that accept external URLs or user-submitted sitemaps.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
LlamaIndex pip < 0.3.2 0.3.2
50.2K 238 dependents Pushed 4d ago 87% patched ~50d to patch Full package profile →

Do you use LlamaIndex? You're affected.

How severe is it?

CVSS 3.1
7.5 / 10
EPSS
0.4%
chance of exploitation in 30 days
Higher than 33% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
CISA SSVC: Public PoC
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C None
I None
A High

What should I do?

6 steps
  1. Patch immediately: upgrade llama-index-readers-papers to >= 0.3.2 or llama-index to >= 0.12.29.

  2. If patching is delayed, disable or sandbox the Papers Loader until patched.

  3. Compensating control: apply XML entity limits at the parser level (Python: use defusedxml or set entity expansion limits).

  4. Validate and allowlist sitemap URLs before processing — reject untrusted or user-supplied URLs.

  5. Monitor document ingestion workers for abnormal memory spikes as a detection signal.

  6. Audit your dependency tree for llama-index-readers-papers usage across all services.

What does CISA's SSVC say?

Decision Track*
Exploitation poc
Automatable Yes
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Art. 9 - Risk management system
ISO 42001
6.1.2 - AI risk assessment
NIST AI RMF
MANAGE 2.2 - Mechanisms are in place and applied to sustain the value of deployed AI systems
OWASP LLM Top 10
LLM05:2025 - Improper Output Handling / Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2025-3225?

Any RAG or document ingestion pipeline using llama-index-readers-papers to process sitemaps is vulnerable to a billion-laughs DoS that can crash the service via memory exhaustion. The fix is available: upgrade to llama-index-readers-papers >= 0.3.2 (llama-index >= 0.12.29) now. No exploitation observed in the wild yet, but the attack is trivial to craft.

Is CVE-2025-3225 actively exploited?

No confirmed active exploitation of CVE-2025-3225 has been reported, but organizations should still patch proactively.

How to fix CVE-2025-3225?

1. Patch immediately: upgrade llama-index-readers-papers to >= 0.3.2 or llama-index to >= 0.12.29. 2. If patching is delayed, disable or sandbox the Papers Loader until patched. 3. Compensating control: apply XML entity limits at the parser level (Python: use defusedxml or set entity expansion limits). 4. Validate and allowlist sitemap URLs before processing — reject untrusted or user-supplied URLs. 5. Monitor document ingestion workers for abnormal memory spikes as a detection signal. 6. Audit your dependency tree for llama-index-readers-papers usage across all services.

What systems are affected by CVE-2025-3225?

This vulnerability affects the following AI/ML architecture patterns: RAG pipelines, document ingestion pipelines, LLM agent frameworks, automated data loaders.

What is the CVSS score for CVE-2025-3225?

CVE-2025-3225 has a CVSS v3.1 base score of 7.5 (HIGH). The EPSS exploitation probability is 0.41%.

What is the AI security impact?

Affected AI Architectures

RAG pipelinesdocument ingestion pipelinesLLM agent frameworksautomated data loaders

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0029 Denial of AI Service
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art. 9
ISO 42001: 6.1.2
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM05:2025

What are the technical details?

Original Advisory

An XML Entity Expansion vulnerability, also known as a 'billion laughs' attack, exists in the sitemap parser of the run-llama/llama_index repository, specifically affecting the Papers Loaders package before version 0.3.2 (in llama-index v0.10.0 and above through v0.12.29). This vulnerability allows an attacker to supply a malicious Sitemap XML, leading to a Denial of Service (DoS) by exhausting system memory and potentially causing a system crash. The issue is resolved in version 0.3.2 (in llama-index 0.12.29).

Exploitation Scenario

An adversary targeting an organization's RAG pipeline identifies that it uses llama-index to ingest papers from external sitemaps. They craft a malicious XML sitemap containing deeply nested entity references (classic billion-laughs structure) and either submit it via a public-facing document upload endpoint, inject the URL into an automated pipeline that crawls academic sources, or host it on a compromised domain the pipeline is configured to ingest. When the Papers Loader parses the sitemap, recursive entity expansion consumes all available memory, crashing the ingestion worker and halting RAG knowledge base updates.

Weaknesses (CWE)

CWE-776 — Improper Restriction of Recursive Entity References in DTDs ('XML Entity Expansion'): The product uses XML documents and allows their structure to be defined with a Document Type Definition (DTD), but it does not properly control the number of recursive definitions of entities.

  • [Operation] If possible, prohibit the use of DTDs or use an XML parser that limits the expansion of recursive DTD entities.
  • [Implementation] Before parsing XML files with associated DTDs, scan for recursive entity declarations and do not continue parsing potentially explosive content.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
July 7, 2025
Last Modified
July 7, 2025
First Seen
March 24, 2026

Related Vulnerabilities