GHSA-fv5p-p927-qmxr: langchain-text-splitters: SSRF bypass exposes cloud metadata
GHSA-fv5p-p927-qmxr MEDIUMA Server-Side Request Forgery bypass in LangChain's HTMLHeaderTextSplitter allows attackers to weaponize open HTTP redirects to reach internal network resources — including AWS IMDSv1 credential endpoints — that the built-in validate_safe_url() check was explicitly designed to block. With 2,448 downstream dependents and a package risk score of 77/100, this affects a broad cross-section of LangChain-based document ingestion and RAG pipelines that accept user-supplied URLs. No public exploit exists and the CVE is not in CISA KEV, but the attack requires zero privileges and low complexity — any team trusting validate_safe_url() for SSRF protection is currently exposed. Patch to langchain-text-splitters >= 1.1.2 (with langchain-core >= 1.2.31) immediately and audit all usage of split_text_from_url() for user-controlled inputs.
Risk Assessment
Medium by CVSS (6.5), but operationally higher risk for cloud-deployed AI applications — particularly those running on AWS with IMDSv1 enabled. The vulnerability defeats an explicitly provided security control rather than exploiting a missing one, making it a trust-breaking bug with compounding impact. No authentication is required, attack complexity is low, and the blast radius spans 2,448 downstream packages. Risk escalates sharply when applications surface Document content back to the URL supplier, enabling credential exfiltration in a single interaction.
Attack Kill Chain
Affected Systems
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| langchain-text-splitters | pip | < 1.1.2 | 1.1.2 |
Do you use langchain-text-splitters? You're affected.
Severity & Risk
Attack Surface
Recommended Action
1 step-
1) Upgrade langchain-text-splitters to >= 1.1.2 and langchain-core to >= 1.2.31 immediately — the fix replaces requests.get() with an SSRF-safe httpx transport that validates DNS results and pins connections on every request including redirect targets. 2) Migrate away from the now-deprecated split_text_from_url() — fetch HTML manually using an SSRF-safe HTTP client and pass content directly to split_text(). 3) Enable AWS IMDSv2 on all EC2 instances running LangChain workloads to require token headers and neutralize the IMDSv1 exposure path. 4) Audit all codepaths that accept external URLs for AI document processing pipelines. 5) Detect exploitation attempts by monitoring for outbound HTTP requests to 169.254.169.254 or other link-local and RFC1918 ranges originating from LangChain services.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is GHSA-fv5p-p927-qmxr?
A Server-Side Request Forgery bypass in LangChain's HTMLHeaderTextSplitter allows attackers to weaponize open HTTP redirects to reach internal network resources — including AWS IMDSv1 credential endpoints — that the built-in validate_safe_url() check was explicitly designed to block. With 2,448 downstream dependents and a package risk score of 77/100, this affects a broad cross-section of LangChain-based document ingestion and RAG pipelines that accept user-supplied URLs. No public exploit exists and the CVE is not in CISA KEV, but the attack requires zero privileges and low complexity — any team trusting validate_safe_url() for SSRF protection is currently exposed. Patch to langchain-text-splitters >= 1.1.2 (with langchain-core >= 1.2.31) immediately and audit all usage of split_text_from_url() for user-controlled inputs.
Is GHSA-fv5p-p927-qmxr actively exploited?
No confirmed active exploitation of GHSA-fv5p-p927-qmxr has been reported, but organizations should still patch proactively.
How to fix GHSA-fv5p-p927-qmxr?
1) Upgrade langchain-text-splitters to >= 1.1.2 and langchain-core to >= 1.2.31 immediately — the fix replaces requests.get() with an SSRF-safe httpx transport that validates DNS results and pins connections on every request including redirect targets. 2) Migrate away from the now-deprecated split_text_from_url() — fetch HTML manually using an SSRF-safe HTTP client and pass content directly to split_text(). 3) Enable AWS IMDSv2 on all EC2 instances running LangChain workloads to require token headers and neutralize the IMDSv1 exposure path. 4) Audit all codepaths that accept external URLs for AI document processing pipelines. 5) Detect exploitation attempts by monitoring for outbound HTTP requests to 169.254.169.254 or other link-local and RFC1918 ranges originating from LangChain services.
What systems are affected by GHSA-fv5p-p927-qmxr?
This vulnerability affects the following AI/ML architecture patterns: RAG pipelines, document processing pipelines, agent frameworks, LLM application servers.
What is the CVSS score for GHSA-fv5p-p927-qmxr?
GHSA-fv5p-p927-qmxr has a CVSS v3.1 base score of 6.5 (MEDIUM).
Technical Details
NVD Description
## Summary `HTMLHeaderTextSplitter.split_text_from_url()` validated the initial URL using `validate_safe_url()` but then performed the fetch with `requests.get()` with redirects enabled (the default). Because redirect targets were not revalidated, a URL pointing to an attacker-controlled server could redirect to internal, localhost, or cloud metadata endpoints, bypassing SSRF protections. The response body is parsed and returned as `Document` objects to the calling application code. Whether this constitutes a data exfiltration path depends on the application: if it exposes Document contents (or derivatives) back to the requester who supplied the URL, sensitive data from internal endpoints could be leaked. Applications that store or process Documents internally without returning raw content to the requester are not directly exposed to data exfiltration through this issue. ## Affected versions - `langchain-text-splitters` < 1.1.2 ## Patched versions - `langchain-text-splitters` >= 1.1.2 (requires `langchain-core` >= 1.2.31) ## Affected code **File:** `libs/text-splitters/langchain_text_splitters/html.py` — `split_text_from_url()` The vulnerable pattern validated the URL once then fetched with redirects enabled: ```python validate_safe_url(url, allow_private=False, allow_http=True) response = requests.get(url, timeout=timeout, **kwargs) ``` ## Attack scenario 1. A developer passes external URLs to `split_text_from_url()`, relying on its built-in `validate_safe_url()` check to block requests to internal networks. 2. An attacker supplies a URL pointing to a public host they control. The URL passes `validate_safe_url()` (public hostname, public IP). 3. The attacker's server responds with a `302` redirect to an internal endpoint (e.g., an unauthenticated internal admin API, or a cloud instance metadata service that does not require request headers — such as AWS IMDSv1). 4. `requests.get()` follows the redirect automatically. The redirect target is **not** revalidated. 5. The response body is parsed and returned as `Document` objects to the application. **Notes:** - The core issue is a bypass of an explicitly provided SSRF protection. `split_text_from_url()` included `validate_safe_url()` specifically to be safe with untrusted URLs — the redirect loophole defeated that guarantee. - Cloud metadata endpoints that require special headers (AWS IMDSv2, GCP, Azure) are not reachable through this bug because the attacker does not control request headers. AWS IMDSv1, which requires no headers, is reachable. - Data exfiltration requires the application to return Document contents to the party that supplied the URL. The SSRF itself — forcing the server to issue a request to an internal endpoint — does not require this. ## Fix The fix replaces `requests.get()` with an SSRF-safe httpx transport (`SSRFSafeSyncTransport` from `langchain-core`) that validates DNS results and pins connections to validated IPs on every request, including redirect targets, eliminating redirect-based bypasses. Additionally, `split_text_from_url()` has been deprecated. Users should fetch HTML content themselves and pass it to `split_text()` directly.
Exploitation Scenario
An attacker targets a LangChain-powered document Q&A application that accepts user-submitted URLs for RAG ingestion. They submit a URL pointing to a server they control (e.g., attacker.com/doc). The application calls split_text_from_url() which validates the URL — attacker.com is a public hostname, so validate_safe_url() passes. When LangChain fetches the URL with requests.get(), attacker.com responds with HTTP 302 redirecting to http://169.254.169.254/latest/meta-data/iam/security-credentials/app-role (AWS IMDSv1). LangChain silently follows the redirect, receives the IAM role credentials JSON, and parses the response body as Document objects. If the application surfaces these Documents in its RAG context or API response, the attacker retrieves valid AWS credentials enabling account-level compromise.
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:N/A:N References
Timeline
Related Vulnerabilities
CVE-2025-2828 10.0 LangChain RequestsToolkit: SSRF exposes cloud metadata
Same package: langchain CVE-2023-34540 9.8 LangChain: RCE via JiraAPIWrapper crafted input
Same package: langchain CVE-2023-29374 9.8 LangChain: RCE via prompt injection in LLMMathChain
Same package: langchain CVE-2023-34541 9.8 LangChain: RCE via unsafe load_prompt deserialization
Same package: langchain CVE-2023-36258 9.8 LangChain: unauthenticated RCE via code injection
Same package: langchain
AI Threat Alert