CVE-2024-2965: langchain-community: DoS via recursive sitemap loop
GHSA-3hjh-jh2h-vrg6 MEDIUM CISA: TRACK*If your LangChain-based application uses SitemapLoader with user-controlled URLs, an attacker can crash the Python process by submitting a self-referencing sitemap — taking the service offline. Upgrade langchain-community to 0.2.5 or later. Low urgency for environments where sitemap URLs are hardcoded or admin-controlled; high urgency for multi-tenant or user-driven data ingestion pipelines.
What is the risk?
Low-to-medium real-world risk. CVSS 4.2 with AV:P reflects conservative NVD scoring, but any deployment accepting user-supplied sitemap URLs is effectively remotely exploitable. EPSS of 0.00038 and absence from CISA KEV confirm no observed in-the-wild exploitation. Primary impact is availability: a single malicious input can crash the entire AI service process. Risk multiplies in multi-tenant architectures where one user can affect all others.
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| LangChain | pip | >= 0, < 0.2.5 | 0.2.5 |
| LangChain Community | pip | < 0.2.5 | 0.2.5 |
How severe is it?
What is the attack surface?
What should I do?
1 step-
1) Patch: Upgrade langchain and langchain-community to >= 0.2.5 — this is the only complete fix. 2) Audit: Grep codebase for SitemapLoader usage and determine whether URLs are user-controlled or hardcoded. 3) Workaround (if patching is delayed): Wrap parse_sitemap calls in try/except RecursionError with circuit-breaker logic, or enforce a URL allowlist for permitted sitemaps. 4) Detect: Alert on RecursionError exceptions in AI service logs — unexpected occurrences may indicate active exploitation attempts. 5) Dependency scanning: Add langchain-community to SCA tooling with a fail-build rule for versions < 0.2.5.
What does CISA's SSVC say?
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2024-2965?
If your LangChain-based application uses SitemapLoader with user-controlled URLs, an attacker can crash the Python process by submitting a self-referencing sitemap — taking the service offline. Upgrade langchain-community to 0.2.5 or later. Low urgency for environments where sitemap URLs are hardcoded or admin-controlled; high urgency for multi-tenant or user-driven data ingestion pipelines.
Is CVE-2024-2965 actively exploited?
No confirmed active exploitation of CVE-2024-2965 has been reported, but organizations should still patch proactively.
How to fix CVE-2024-2965?
1) Patch: Upgrade langchain and langchain-community to >= 0.2.5 — this is the only complete fix. 2) Audit: Grep codebase for SitemapLoader usage and determine whether URLs are user-controlled or hardcoded. 3) Workaround (if patching is delayed): Wrap parse_sitemap calls in try/except RecursionError with circuit-breaker logic, or enforce a URL allowlist for permitted sitemaps. 4) Detect: Alert on RecursionError exceptions in AI service logs — unexpected occurrences may indicate active exploitation attempts. 5) Dependency scanning: Add langchain-community to SCA tooling with a fail-build rule for versions < 0.2.5.
What systems are affected by CVE-2024-2965?
This vulnerability affects the following AI/ML architecture patterns: RAG pipelines, agent frameworks, document processing pipelines.
What is the CVSS score for CVE-2024-2965?
CVE-2024-2965 has a CVSS v3.1 base score of 4.2 (MEDIUM). The EPSS exploitation probability is 0.30%.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0010.001 AI Software AML.T0029 Denial of AI Service AML.T0049 Exploit Public-Facing Application Compliance Controls Affected
What are the technical details?
Original Advisory
Denial of service in `SitemapLoader` Document Loader in the `langchain-community` package, affecting versions below 0.2.5. The `parse_sitemap` method, responsible for parsing sitemaps and extracting URLs, lacks a mechanism to prevent infinite recursion when a sitemap URL refers to the current sitemap itself. This oversight allows for the possibility of an infinite loop, leading to a crash by exceeding the maximum recursion depth in Python. This vulnerability can be exploited to occupy server socket/port resources and crash the Python process, impacting the availability of services relying on this functionality.
Exploitation Scenario
An adversary targeting a RAG-powered product that allows users to specify external documentation sources submits a URL pointing to a crafted sitemap.xml that references itself (e.g., <sitemap><loc>https://attacker.com/sitemap.xml</loc></sitemap>). LangChain's SitemapLoader fetches it and calls parse_sitemap recursively with no depth limit or cycle detection. After ~1000 recursive calls, Python raises RecursionError and the process crashes. In a shared SaaS environment, this takes down the service for all tenants simultaneously. Attacker effort: minimal — a single HTTP request with a self-referencing sitemap is sufficient.
Weaknesses (CWE)
CWE-400 — Uncontrolled Resource Consumption: The product does not properly control the allocation and maintenance of a limited resource.
- [Architecture and Design] Design throttling mechanisms into the system architecture. The best protection is to limit the amount of resources that an unauthorized user can cause to be expended. A strong authentication and access control model will help prevent such attacks from occurring in the first place. The login application should be protected against DoS attacks as much as possible. Limiting the database access, perhaps by caching result sets, can help minimize the resources expended. To further limit the potential for a DoS attack, consider tracking the rate of requests received from users and blocking requests that exceed a defined rate threshold.
- [Architecture and Design] Mitigation of resource exhaustion attacks requires that the target system either: The first of these solutions is an issue in itself though, since it may allow attackers to prevent the use of the system by a particular valid user. If the attacker impersonates the valid user, they may be able to prevent the user from accessing the server in question. The second solution is simply difficult to effectively institute -- and even when properly done, it does not provide a full solution. It simply makes the attack require more resources on the part of the attacker. recognizes the attack and denies that user further access for a given amount of time, or uniformly throttles all requests in order to make it more difficult to consume resources more quickly than they can again be freed.
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.0/AV:P/AC:H/PR:N/UI:N/S:U/C:N/I:N/A:H References
- github.com/advisories/GHSA-3hjh-jh2h-vrg6
- github.com/langchain-ai/langchain/commit/73c42306745b0831aa6fe7fe4eeb70d2c2d87a82
- github.com/langchain-ai/langchain/commit/9a877c7adbd06f90a2518152f65b562bd90487cc
- github.com/langchain-ai/langchain/pull/22903
- github.com/pypa/advisory-database/tree/main/vulns/langchain/PYSEC-2024-118.yaml
- huntr.com/bounties/90b0776d-9fa6-4841-aac4-09fde5918cae
- nvd.nist.gov/vuln/detail/CVE-2024-2965
Timeline
Related Vulnerabilities
CVE-2025-2828 10.0 LangChain RequestsToolkit: SSRF exposes cloud metadata
Same package: langchain CVE-2023-34541 9.8 LangChain: RCE via unsafe load_prompt deserialization
Same package: langchain CVE-2023-29374 9.8 LangChain: RCE via prompt injection in LLMMathChain
Same package: langchain CVE-2023-34540 9.8 LangChain: RCE via JiraAPIWrapper crafted input
Same package: langchain CVE-2023-36258 9.8 LangChain: unauthenticated RCE via code injection
Same package: langchain