GHSA-fv5p-p927-qmxr: langchain-text-splitters: SSRF bypass exposes cloud metadata

GHSA-fv5p-p927-qmxr MEDIUM
Published April 16, 2026
CISO Take

A Server-Side Request Forgery bypass in LangChain's HTMLHeaderTextSplitter allows attackers to weaponize open HTTP redirects to reach internal network resources — including AWS IMDSv1 credential endpoints — that the built-in validate_safe_url() check was explicitly designed to block. With 2,448 downstream dependents and a package risk score of 77/100, this affects a broad cross-section of LangChain-based document ingestion and RAG pipelines that accept user-supplied URLs. No public exploit exists and the CVE is not in CISA KEV, but the attack requires zero privileges and low complexity — any team trusting validate_safe_url() for SSRF protection is currently exposed. Patch to langchain-text-splitters >= 1.1.2 (with langchain-core >= 1.2.31) immediately and audit all usage of split_text_from_url() for user-controlled inputs.

Sources: GitHub Advisory ATLAS OpenSSF

Risk Assessment

Medium by CVSS (6.5), but operationally higher risk for cloud-deployed AI applications — particularly those running on AWS with IMDSv1 enabled. The vulnerability defeats an explicitly provided security control rather than exploiting a missing one, making it a trust-breaking bug with compounding impact. No authentication is required, attack complexity is low, and the blast radius spans 2,448 downstream packages. Risk escalates sharply when applications surface Document content back to the URL supplier, enabling credential exfiltration in a single interaction.

Attack Kill Chain

Initial Access
Attacker submits a crafted URL pointing to an attacker-controlled server to an application using split_text_from_url(); the URL passes validate_safe_url() because the hostname resolves to a public IP.
AML.T0049
SSRF Redirect
Attacker's server responds with an HTTP 302 redirect targeting an internal endpoint (e.g., AWS IMDSv1 at 169.254.169.254) that is not revalidated before LangChain follows it.
AML.T0010.001
Internal Resource Fetch
requests.get() automatically follows the redirect to the internal endpoint, obtaining sensitive data such as IAM role credentials or unauthenticated internal API responses.
AML.T0025
Data Exfiltration
Internal endpoint response is parsed into Document objects and returned to the application; if surfaced to the attacker via application output or RAG context, sensitive credentials or internal data are exfiltrated.
AML.T0025

Affected Systems

Package Ecosystem Vulnerable Range Patched
langchain-text-splitters pip < 1.1.2 1.1.2
133.2K OpenSSF 6.0 2.4K dependents Pushed 5d ago 18% patched ~256d to patch Full package profile →

Do you use langchain-text-splitters? You're affected.

Severity & Risk

CVSS 3.1
6.5 / 10
EPSS
N/A
Exploitation Status
No known exploitation
Sophistication
Trivial

Attack Surface

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI Required
S Unchanged
C High
I None
A None

Recommended Action

1 step
  1. 1) Upgrade langchain-text-splitters to >= 1.1.2 and langchain-core to >= 1.2.31 immediately — the fix replaces requests.get() with an SSRF-safe httpx transport that validates DNS results and pins connections on every request including redirect targets. 2) Migrate away from the now-deprecated split_text_from_url() — fetch HTML manually using an SSRF-safe HTTP client and pass content directly to split_text(). 3) Enable AWS IMDSv2 on all EC2 instances running LangChain workloads to require token headers and neutralize the IMDSv1 exposure path. 4) Audit all codepaths that accept external URLs for AI document processing pipelines. 5) Detect exploitation attempts by monitoring for outbound HTTP requests to 169.254.169.254 or other link-local and RFC1918 ranges originating from LangChain services.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.9.2 - Information security controls for AI systems
NIST AI RMF
MANAGE 2.2 - Mechanisms to respond and recover from AI risks
OWASP LLM Top 10
LLM02 - Sensitive Information Disclosure LLM06 - Excessive Agency

Frequently Asked Questions

What is GHSA-fv5p-p927-qmxr?

A Server-Side Request Forgery bypass in LangChain's HTMLHeaderTextSplitter allows attackers to weaponize open HTTP redirects to reach internal network resources — including AWS IMDSv1 credential endpoints — that the built-in validate_safe_url() check was explicitly designed to block. With 2,448 downstream dependents and a package risk score of 77/100, this affects a broad cross-section of LangChain-based document ingestion and RAG pipelines that accept user-supplied URLs. No public exploit exists and the CVE is not in CISA KEV, but the attack requires zero privileges and low complexity — any team trusting validate_safe_url() for SSRF protection is currently exposed. Patch to langchain-text-splitters >= 1.1.2 (with langchain-core >= 1.2.31) immediately and audit all usage of split_text_from_url() for user-controlled inputs.

Is GHSA-fv5p-p927-qmxr actively exploited?

No confirmed active exploitation of GHSA-fv5p-p927-qmxr has been reported, but organizations should still patch proactively.

How to fix GHSA-fv5p-p927-qmxr?

1) Upgrade langchain-text-splitters to >= 1.1.2 and langchain-core to >= 1.2.31 immediately — the fix replaces requests.get() with an SSRF-safe httpx transport that validates DNS results and pins connections on every request including redirect targets. 2) Migrate away from the now-deprecated split_text_from_url() — fetch HTML manually using an SSRF-safe HTTP client and pass content directly to split_text(). 3) Enable AWS IMDSv2 on all EC2 instances running LangChain workloads to require token headers and neutralize the IMDSv1 exposure path. 4) Audit all codepaths that accept external URLs for AI document processing pipelines. 5) Detect exploitation attempts by monitoring for outbound HTTP requests to 169.254.169.254 or other link-local and RFC1918 ranges originating from LangChain services.

What systems are affected by GHSA-fv5p-p927-qmxr?

This vulnerability affects the following AI/ML architecture patterns: RAG pipelines, document processing pipelines, agent frameworks, LLM application servers.

What is the CVSS score for GHSA-fv5p-p927-qmxr?

GHSA-fv5p-p927-qmxr has a CVSS v3.1 base score of 6.5 (MEDIUM).

Technical Details

NVD Description

## Summary `HTMLHeaderTextSplitter.split_text_from_url()` validated the initial URL using `validate_safe_url()` but then performed the fetch with `requests.get()` with redirects enabled (the default). Because redirect targets were not revalidated, a URL pointing to an attacker-controlled server could redirect to internal, localhost, or cloud metadata endpoints, bypassing SSRF protections. The response body is parsed and returned as `Document` objects to the calling application code. Whether this constitutes a data exfiltration path depends on the application: if it exposes Document contents (or derivatives) back to the requester who supplied the URL, sensitive data from internal endpoints could be leaked. Applications that store or process Documents internally without returning raw content to the requester are not directly exposed to data exfiltration through this issue. ## Affected versions - `langchain-text-splitters` < 1.1.2 ## Patched versions - `langchain-text-splitters` >= 1.1.2 (requires `langchain-core` >= 1.2.31) ## Affected code **File:** `libs/text-splitters/langchain_text_splitters/html.py` — `split_text_from_url()` The vulnerable pattern validated the URL once then fetched with redirects enabled: ```python validate_safe_url(url, allow_private=False, allow_http=True) response = requests.get(url, timeout=timeout, **kwargs) ``` ## Attack scenario 1. A developer passes external URLs to `split_text_from_url()`, relying on its built-in `validate_safe_url()` check to block requests to internal networks. 2. An attacker supplies a URL pointing to a public host they control. The URL passes `validate_safe_url()` (public hostname, public IP). 3. The attacker's server responds with a `302` redirect to an internal endpoint (e.g., an unauthenticated internal admin API, or a cloud instance metadata service that does not require request headers — such as AWS IMDSv1). 4. `requests.get()` follows the redirect automatically. The redirect target is **not** revalidated. 5. The response body is parsed and returned as `Document` objects to the application. **Notes:** - The core issue is a bypass of an explicitly provided SSRF protection. `split_text_from_url()` included `validate_safe_url()` specifically to be safe with untrusted URLs — the redirect loophole defeated that guarantee. - Cloud metadata endpoints that require special headers (AWS IMDSv2, GCP, Azure) are not reachable through this bug because the attacker does not control request headers. AWS IMDSv1, which requires no headers, is reachable. - Data exfiltration requires the application to return Document contents to the party that supplied the URL. The SSRF itself — forcing the server to issue a request to an internal endpoint — does not require this. ## Fix The fix replaces `requests.get()` with an SSRF-safe httpx transport (`SSRFSafeSyncTransport` from `langchain-core`) that validates DNS results and pins connections to validated IPs on every request, including redirect targets, eliminating redirect-based bypasses. Additionally, `split_text_from_url()` has been deprecated. Users should fetch HTML content themselves and pass it to `split_text()` directly.

Exploitation Scenario

An attacker targets a LangChain-powered document Q&A application that accepts user-submitted URLs for RAG ingestion. They submit a URL pointing to a server they control (e.g., attacker.com/doc). The application calls split_text_from_url() which validates the URL — attacker.com is a public hostname, so validate_safe_url() passes. When LangChain fetches the URL with requests.get(), attacker.com responds with HTTP 302 redirecting to http://169.254.169.254/latest/meta-data/iam/security-credentials/app-role (AWS IMDSv1). LangChain silently follows the redirect, receives the IAM role credentials JSON, and parses the response body as Document objects. If the application surfaces these Documents in its RAG context or API response, the attacker retrieves valid AWS credentials enabling account-level compromise.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:N/A:N

Timeline

Published
April 16, 2026
Last Modified
April 16, 2026
First Seen
April 17, 2026

Related Vulnerabilities