CVE-2025-3225: llama-index Papers Loader: XML expansion DoS

GHSA-w42r-mrx7-c633 HIGH CISA: TRACK*
Published July 7, 2025
CISO Take

Any RAG or document ingestion pipeline using llama-index-readers-papers to process sitemaps is vulnerable to a billion-laughs DoS that can crash the service via memory exhaustion. The fix is available: upgrade to llama-index-readers-papers >= 0.3.2 (llama-index >= 0.12.29) now. No exploitation observed in the wild yet, but the attack is trivial to craft.

Risk Assessment

CVSS 7.5 High but real-world risk is moderate. EPSS 0.00144 indicates minimal active exploitation. The attack vector is network-accessible with no authentication required and no user interaction, making it a zero-friction DoS if the parser is exposed to untrusted input. The impact is purely availability — no data exposure or code execution. Risk elevates significantly for teams running automated document ingestion pipelines that accept external URLs or user-submitted sitemaps.

Affected Systems

Package Ecosystem Vulnerable Range Patched
llama-index-readers-papers pip < 0.3.2 0.3.2
49.3K 229 dependents Pushed yesterday 87% patched ~50d to patch Full package profile →

Do you use llama-index-readers-papers? You're affected.

Severity & Risk

CVSS 3.1
7.5 / 10
EPSS
0.3%
chance of exploitation in 30 days
Higher than 57% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
CISA SSVC: Public PoC
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C None
I None
A High

Recommended Action

6 steps
  1. Patch immediately: upgrade llama-index-readers-papers to >= 0.3.2 or llama-index to >= 0.12.29.

  2. If patching is delayed, disable or sandbox the Papers Loader until patched.

  3. Compensating control: apply XML entity limits at the parser level (Python: use defusedxml or set entity expansion limits).

  4. Validate and allowlist sitemap URLs before processing — reject untrusted or user-supplied URLs.

  5. Monitor document ingestion workers for abnormal memory spikes as a detection signal.

  6. Audit your dependency tree for llama-index-readers-papers usage across all services.

CISA SSVC Assessment

Decision Track*
Exploitation poc
Automatable Yes
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Art. 9 - Risk management system
ISO 42001
6.1.2 - AI risk assessment
NIST AI RMF
MANAGE 2.2 - Mechanisms are in place and applied to sustain the value of deployed AI systems
OWASP LLM Top 10
LLM05:2025 - Improper Output Handling / Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2025-3225?

Any RAG or document ingestion pipeline using llama-index-readers-papers to process sitemaps is vulnerable to a billion-laughs DoS that can crash the service via memory exhaustion. The fix is available: upgrade to llama-index-readers-papers >= 0.3.2 (llama-index >= 0.12.29) now. No exploitation observed in the wild yet, but the attack is trivial to craft.

Is CVE-2025-3225 actively exploited?

No confirmed active exploitation of CVE-2025-3225 has been reported, but organizations should still patch proactively.

How to fix CVE-2025-3225?

1. Patch immediately: upgrade llama-index-readers-papers to >= 0.3.2 or llama-index to >= 0.12.29. 2. If patching is delayed, disable or sandbox the Papers Loader until patched. 3. Compensating control: apply XML entity limits at the parser level (Python: use defusedxml or set entity expansion limits). 4. Validate and allowlist sitemap URLs before processing — reject untrusted or user-supplied URLs. 5. Monitor document ingestion workers for abnormal memory spikes as a detection signal. 6. Audit your dependency tree for llama-index-readers-papers usage across all services.

What systems are affected by CVE-2025-3225?

This vulnerability affects the following AI/ML architecture patterns: RAG pipelines, document ingestion pipelines, LLM agent frameworks, automated data loaders.

What is the CVSS score for CVE-2025-3225?

CVE-2025-3225 has a CVSS v3.1 base score of 7.5 (HIGH). The EPSS exploitation probability is 0.34%.

Technical Details

NVD Description

An XML Entity Expansion vulnerability, also known as a 'billion laughs' attack, exists in the sitemap parser of the run-llama/llama_index repository, specifically affecting the Papers Loaders package before version 0.3.2 (in llama-index v0.10.0 and above through v0.12.29). This vulnerability allows an attacker to supply a malicious Sitemap XML, leading to a Denial of Service (DoS) by exhausting system memory and potentially causing a system crash. The issue is resolved in version 0.3.2 (in llama-index 0.12.29).

Exploitation Scenario

An adversary targeting an organization's RAG pipeline identifies that it uses llama-index to ingest papers from external sitemaps. They craft a malicious XML sitemap containing deeply nested entity references (classic billion-laughs structure) and either submit it via a public-facing document upload endpoint, inject the URL into an automated pipeline that crawls academic sources, or host it on a compromised domain the pipeline is configured to ingest. When the Papers Loader parses the sitemap, recursive entity expansion consumes all available memory, crashing the ingestion worker and halting RAG knowledge base updates.

CVSS Vector

CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
July 7, 2025
Last Modified
July 7, 2025
First Seen
March 24, 2026

Related Vulnerabilities