CVE-2025-6985: langchain-text-splitters: XXE enables arbitrary file read
GHSA-m42m-m8cr-8m58 HIGH PoC AVAILABLE CISA: TRACK*Upgrade langchain-text-splitters to 0.3.9 immediately — any deployment using HTMLSectionSplitter with user-supplied or external XSLT is fully exposed with zero authentication required. This is a direct, unauthenticated path to reading SSH keys, API credentials, and .env files from your LangChain process. If you cannot patch now, remove custom XSLT input at the application layer and restrict the process filesystem access.
Risk Assessment
High risk for organizations running LangChain-based pipelines that accept external or user-controlled XSLT input. CVSS 7.5 with network vector, no privileges, and no user interaction makes this trivially exploitable against exposed endpoints. EPSS of 0.00235 suggests no active widespread exploitation yet, but the attack surface is broad given LangChain's adoption in production AI systems. The primary risk is credential and secrets exfiltration rather than code execution, with cloud metadata endpoints (AWS IMDS, GCP) representing a critical secondary exposure.
Affected Systems
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| langchain-text-splitters | pip | < 0.3.9 | 0.3.9 |
Do you use langchain-text-splitters? You're affected.
Severity & Risk
Attack Surface
Recommended Action
6 steps-
PATCH
Upgrade langchain-text-splitters to >= 0.3.9 immediately.
-
WORKAROUND
Disable or restrict custom XSLT input paths in your application before patching.
-
LEAST PRIVILEGE
Run LangChain processes with restricted filesystem access — no access to ~/.ssh/, .env files, or instance metadata paths.
-
NETWORK CONTROL
Block outbound HTTP/HTTPS from LangChain processes to cloud metadata IPs (169.254.169.254, 169.254.169.254/latest).
-
DETECTION
Monitor for unusual file access from LangChain processes, especially to /etc/passwd, ~/.ssh/, home directories, and metadata endpoints.
-
AUDIT
Inventory all services using HTMLSectionSplitter — check if XSLT input originates from user-controlled or external sources.
CISA SSVC Assessment
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2025-6985?
Upgrade langchain-text-splitters to 0.3.9 immediately — any deployment using HTMLSectionSplitter with user-supplied or external XSLT is fully exposed with zero authentication required. This is a direct, unauthenticated path to reading SSH keys, API credentials, and .env files from your LangChain process. If you cannot patch now, remove custom XSLT input at the application layer and restrict the process filesystem access.
Is CVE-2025-6985 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2025-6985, increasing the risk of exploitation.
How to fix CVE-2025-6985?
1. PATCH: Upgrade langchain-text-splitters to >= 0.3.9 immediately. 2. WORKAROUND: Disable or restrict custom XSLT input paths in your application before patching. 3. LEAST PRIVILEGE: Run LangChain processes with restricted filesystem access — no access to ~/.ssh/, .env files, or instance metadata paths. 4. NETWORK CONTROL: Block outbound HTTP/HTTPS from LangChain processes to cloud metadata IPs (169.254.169.254, 169.254.169.254/latest). 5. DETECTION: Monitor for unusual file access from LangChain processes, especially to /etc/passwd, ~/.ssh/, home directories, and metadata endpoints. 6. AUDIT: Inventory all services using HTMLSectionSplitter — check if XSLT input originates from user-controlled or external sources.
What systems are affected by CVE-2025-6985?
This vulnerability affects the following AI/ML architecture patterns: RAG pipelines, document ingestion pipelines, LLM agent frameworks, data preprocessing pipelines.
What is the CVSS score for CVE-2025-6985?
CVE-2025-6985 has a CVSS v3.1 base score of 7.5 (HIGH). The EPSS exploitation probability is 0.20%.
Technical Details
NVD Description
The HTMLSectionSplitter class in langchain-text-splitters version 0.3.8 is vulnerable to XML External Entity (XXE) attacks due to unsafe XSLT parsing. This vulnerability arises because the class allows the use of arbitrary XSLT stylesheets, which are parsed using lxml.etree.parse() and lxml.etree.XSLT() without any hardening measures. In lxml versions up to 4.9.x, external entities are resolved by default, allowing attackers to read arbitrary local files or perform outbound HTTP(S) fetches. In lxml versions 5.0 and above, while entity expansion is disabled, the XSLT document() function can still read any URI unless XSLTAccessControl is applied. This vulnerability allows remote attackers to gain read-only access to any file the LangChain process can reach, including sensitive files such as SSH keys, environment files, source code, or cloud metadata. No authentication, special privileges, or user interaction are required, and the issue is exploitable in default deployments that enable custom XSLT.
Exploitation Scenario
An adversary submits a crafted HTML document containing a malicious embedded XSLT stylesheet to a RAG document ingestion endpoint. The XSLT instructs lxml to read /app/.env (containing LLM API keys and database credentials) or the AWS IMDS endpoint at http://169.254.169.254/latest/meta-data/iam/security-credentials/ to retrieve temporary IAM credentials. File contents surface in parsed output or error responses, enabling full credential exfiltration with no authentication, privileges, or user interaction. In batch pipelines, the attack can be embedded in a document uploaded to a monitored S3 bucket or shared drive, triggering passively on the next ingestion run.
CVSS Vector
CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N References
- github.com/advisories/GHSA-m42m-m8cr-8m58
- github.com/langchain-ai/langchain/commit/43eef435505a1c907227b724c0c760ad5fc01790
- github.com/langchain-ai/langchain/pull/31819
- nvd.nist.gov/vuln/detail/CVE-2025-6985
- huntr.com/bounties/cf78abbb-df3b-43de-b6ee-132b73ff8331
- github.com/ARPSyndicate/cve-scores Exploit
- github.com/fkie-cad/nvd-json-data-feeds Exploit
Timeline
Related Vulnerabilities
CVE-2025-2828 10.0 LangChain RequestsToolkit: SSRF exposes cloud metadata
Same package: langchain CVE-2023-34540 9.8 LangChain: RCE via JiraAPIWrapper crafted input
Same package: langchain CVE-2023-29374 9.8 LangChain: RCE via prompt injection in LLMMathChain
Same package: langchain CVE-2023-34541 9.8 LangChain: RCE via unsafe load_prompt deserialization
Same package: langchain CVE-2023-36258 9.8 LangChain: unauthenticated RCE via code injection
Same package: langchain
AI Threat Alert