CVE-2024-12704: llama-index: DoS via infinite loop in LangChain LLM

GHSA-j3wr-m6xh-64hg HIGH PoC AVAILABLE CISA: TRACK*
Published March 20, 2025
CISO Take

Any production service using llama-index with LangChain LLM streaming is vulnerable to process hang with zero authentication required — attacker just sends a malformed input. Upgrade llama-index-core to 0.12.6 immediately; if you cannot patch now, disable or gate the streaming endpoint. EPSS is low (0.27%) but the exploit is trivial and the blast radius covers all RAG and agent pipelines using this integration.

Risk Assessment

HIGH severity (CVSS 7.5) with a trivial exploitation path: network-accessible, no privileges, no user interaction required. The vulnerability is a pure availability impact — no data exposure or privilege escalation. EPSS of 0.00271 suggests no observed mass exploitation yet, but the attack primitive (sending a wrong-type input to a streaming endpoint) requires zero AI/ML expertise. Risk is elevated for any team running llama-index in production with LangChain LLM wrappers and public-facing APIs.

Affected Systems

Package Ecosystem Vulnerable Range Patched
llama-index-core pip < 0.12.6 0.12.6
49.1K 1.1K dependents Pushed 8d ago 100% patched ~50d to patch Full package profile →
llamaindex pip No patch
49.1K Pushed 8d ago 0% patched Full package profile →

Severity & Risk

CVSS 3.1
7.5 / 10
EPSS
0.4%
chance of exploitation in 30 days
Higher than 58% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
CISA SSVC: Public PoC
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C None
I None
A High

Recommended Action

6 steps
  1. PATCH

    Upgrade llama-index-core to >= 0.12.6 (patch commit d1ecfb77). This is the only complete fix.

  2. WORKAROUND (if immediate patch is not possible): Replace calls to stream_complete on LangChainLLM instances with synchronous complete; remove streaming endpoints from public exposure.

  3. INPUT VALIDATION

    Add type-checking middleware to reject malformed inputs before they reach LLM wrappers.

  4. CIRCUIT BREAKER

    Implement per-request timeouts (e.g., 30s) and process-level watchdogs (e.g., supervisord, Kubernetes liveness probes) to auto-restart hung workers.

  5. DETECTION

    Monitor for LLM inference worker threads that do not terminate within expected latency windows; alert on CPU spikes correlated with incomplete LLM responses.

  6. AUDIT

    Inventory all internal services importing llama-index and check version with: pip show llama-index-core | grep Version

CISA SSVC Assessment

Decision Track*
Exploitation poc
Automatable Yes
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 9 - Risk management system
ISO 42001
8.4 - AI system operation
NIST AI RMF
MANAGE-2.2 - Mechanisms are in place to respond to risks identified in AI systems
OWASP LLM Top 10
LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2024-12704?

Any production service using llama-index with LangChain LLM streaming is vulnerable to process hang with zero authentication required — attacker just sends a malformed input. Upgrade llama-index-core to 0.12.6 immediately; if you cannot patch now, disable or gate the streaming endpoint. EPSS is low (0.27%) but the exploit is trivial and the blast radius covers all RAG and agent pipelines using this integration.

Is CVE-2024-12704 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2024-12704, increasing the risk of exploitation.

How to fix CVE-2024-12704?

1. PATCH: Upgrade llama-index-core to >= 0.12.6 (patch commit d1ecfb77). This is the only complete fix. 2. WORKAROUND (if immediate patch is not possible): Replace calls to stream_complete on LangChainLLM instances with synchronous complete; remove streaming endpoints from public exposure. 3. INPUT VALIDATION: Add type-checking middleware to reject malformed inputs before they reach LLM wrappers. 4. CIRCUIT BREAKER: Implement per-request timeouts (e.g., 30s) and process-level watchdogs (e.g., supervisord, Kubernetes liveness probes) to auto-restart hung workers. 5. DETECTION: Monitor for LLM inference worker threads that do not terminate within expected latency windows; alert on CPU spikes correlated with incomplete LLM responses. 6. AUDIT: Inventory all internal services importing llama-index and check version with: pip show llama-index-core | grep Version

What systems are affected by CVE-2024-12704?

This vulnerability affects the following AI/ML architecture patterns: RAG pipelines, agent frameworks, LLM serving (streaming), document processing pipelines, chatbot backends.

What is the CVSS score for CVE-2024-12704?

CVE-2024-12704 has a CVSS v3.1 base score of 7.5 (HIGH). The EPSS exploitation probability is 0.35%.

Technical Details

NVD Description

A vulnerability in the LangChainLLM class of the run-llama/llama_index repository, version v0.12.5, allows for a Denial of Service (DoS) attack. The stream_complete method executes the llm using a thread and retrieves the result via the get_response_gen method of the StreamingGeneratorCallbackHandler class. If the thread terminates abnormally before the _llm.predict is executed, there is no exception handling for this case, leading to an infinite loop in the get_response_gen function. This can be triggered by providing an input of an incorrect type, causing the thread to terminate and the process to continue running indefinitely.

Exploitation Scenario

An adversary identifies a public-facing endpoint (chatbot, document Q&A, or RAG API) built on llama-index. They send an HTTP request with a malformed payload — for example, passing an integer or list where the LangChainLLM wrapper expects a string prompt. The LangChainLLM.stream_complete method launches a background thread that crashes before _llm.predict executes. The main thread, waiting in get_response_gen, enters an infinite loop with no exit condition. The worker process hangs indefinitely. The attacker repeats the request to exhaust all available workers, bringing the service down. No authentication, no AI/ML knowledge, and no special tooling required — a single malformed HTTP request is sufficient.

CVSS Vector

CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
March 20, 2025
Last Modified
February 24, 2026
First Seen
March 20, 2025

Related Vulnerabilities