GHSA-fv5p-p927-qmxr: LangChain SSRF bypass exposes

CISO Take

A Server-Side Request Forgery bypass in LangChain's HTMLHeaderTextSplitter allows attackers to weaponize open HTTP redirects to reach internal network resources — including AWS IMDSv1 credential endpoints — that the built-in validate_safe_url() check was explicitly designed to block. With 2,448 downstream dependents and a package risk score of 77/100, this affects a broad cross-section of LangChain-based document ingestion and RAG pipelines that accept user-supplied URLs. No public exploit exists and the CVE is not in CISA KEV, but the attack requires zero privileges and low complexity — any team trusting validate_safe_url() for SSRF protection is currently exposed. Patch to langchain-text-splitters >= 1.1.2 (with langchain-core >= 1.2.31) immediately and audit all usage of split_text_from_url() for user-controlled inputs.

Sources: GitHub Advisory ATLAS OpenSSF

What is the risk?

Medium by CVSS (6.5), but operationally higher risk for cloud-deployed AI applications — particularly those running on AWS with IMDSv1 enabled. The vulnerability defeats an explicitly provided security control rather than exploiting a missing one, making it a trust-breaking bug with compounding impact. No authentication is required, attack complexity is low, and the blast radius spans 2,448 downstream packages. Risk escalates sharply when applications surface Document content back to the URL supplier, enabling credential exfiltration in a single interaction.

How does the attack unfold?

Initial Access

Attacker submits a crafted URL pointing to an attacker-controlled server to an application using split_text_from_url(); the URL passes validate_safe_url() because the hostname resolves to a public IP.

AML.T0049

SSRF Redirect

Attacker's server responds with an HTTP 302 redirect targeting an internal endpoint (e.g., AWS IMDSv1 at 169.254.169.254) that is not revalidated before LangChain follows it.

AML.T0010.001

Internal Resource Fetch

requests.get() automatically follows the redirect to the internal endpoint, obtaining sensitive data such as IAM role credentials or unauthenticated internal API responses.

AML.T0025

Data Exfiltration

Internal endpoint response is parsed into Document objects and returned to the application; if surfaced to the attacker via application output or RAG context, sensitive credentials or internal data are exfiltrated.

AML.T0025

Initial Access

Attacker submits a crafted URL pointing to an attacker-controlled server to an application using split_text_from_url(); the URL passes validate_safe_url() because the hostname resolves to a public IP.

AML.T0049

SSRF Redirect

Attacker's server responds with an HTTP 302 redirect targeting an internal endpoint (e.g., AWS IMDSv1 at 169.254.169.254) that is not revalidated before LangChain follows it.

AML.T0010.001

Internal Resource Fetch

requests.get() automatically follows the redirect to the internal endpoint, obtaining sensitive data such as IAM role credentials or unauthenticated internal API responses.

AML.T0025

Data Exfiltration

Internal endpoint response is parsed into Document objects and returned to the application; if surfaced to the attacker via application output or RAG context, sensitive credentials or internal data are exfiltrated.

AML.T0025

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
LangChain	pip	< 1.1.2	`1.1.2`
139.8K OpenSSF 5.9 2.7K dependents Pushed 2d ago 24% patched ~156d to patch Full package profile →

Do you use LangChain? You're affected.

How severe is it?

CVSS 3.1

6.5 / 10

EPSS

N/A

Exploitation Status

No known exploitation

Sophistication

Trivial

What is the attack surface?

AV Network

AC Low

PR None

UI Required

S Unchanged

C High

I None

A None

What should I do?

1 step

1) Upgrade langchain-text-splitters to >= 1.1.2 and langchain-core to >= 1.2.31 immediately — the fix replaces requests.get() with an SSRF-safe httpx transport that validates DNS results and pins connections on every request including redirect targets. 2) Migrate away from the now-deprecated split_text_from_url() — fetch HTML manually using an SSRF-safe HTTP client and pass content directly to split_text(). 3) Enable AWS IMDSv2 on all EC2 instances running LangChain workloads to require token headers and neutralize the IMDSv1 exposure path. 4) Audit all codepaths that accept external URLs for AI document processing pipelines. 5) Detect exploitation attempts by monitoring for outbound HTTP requests to 169.254.169.254 or other link-local and RFC1918 ranges originating from LangChain services.

How is it classified?

Data Extraction Auth Bypass Framework RAG AML.T0010.001 - AI Software AML.T0025 - Exfiltration via Cyber Means AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.9.2 - Information security controls for AI systems

NIST AI RMF

MANAGE 2.2 - Mechanisms to respond and recover from AI risks

OWASP LLM Top 10

LLM02 - Sensitive Information Disclosure LLM06 - Excessive Agency

Frequently Asked Questions

What is GHSA-fv5p-p927-qmxr?

A Server-Side Request Forgery bypass in LangChain's HTMLHeaderTextSplitter allows attackers to weaponize open HTTP redirects to reach internal network resources — including AWS IMDSv1 credential endpoints — that the built-in validate_safe_url() check was explicitly designed to block. With 2,448 downstream dependents and a package risk score of 77/100, this affects a broad cross-section of LangChain-based document ingestion and RAG pipelines that accept user-supplied URLs. No public exploit exists and the CVE is not in CISA KEV, but the attack requires zero privileges and low complexity — any team trusting validate_safe_url() for SSRF protection is currently exposed. Patch to langchain-text-splitters >= 1.1.2 (with langchain-core >= 1.2.31) immediately and audit all usage of split_text_from_url() for user-controlled inputs.

Is GHSA-fv5p-p927-qmxr actively exploited?

No confirmed active exploitation of GHSA-fv5p-p927-qmxr has been reported, but organizations should still patch proactively.

How to fix GHSA-fv5p-p927-qmxr?

1) Upgrade langchain-text-splitters to >= 1.1.2 and langchain-core to >= 1.2.31 immediately — the fix replaces requests.get() with an SSRF-safe httpx transport that validates DNS results and pins connections on every request including redirect targets. 2) Migrate away from the now-deprecated split_text_from_url() — fetch HTML manually using an SSRF-safe HTTP client and pass content directly to split_text(). 3) Enable AWS IMDSv2 on all EC2 instances running LangChain workloads to require token headers and neutralize the IMDSv1 exposure path. 4) Audit all codepaths that accept external URLs for AI document processing pipelines. 5) Detect exploitation attempts by monitoring for outbound HTTP requests to 169.254.169.254 or other link-local and RFC1918 ranges originating from LangChain services.

What systems are affected by GHSA-fv5p-p927-qmxr?

This vulnerability affects the following AI/ML architecture patterns: RAG pipelines, document processing pipelines, agent frameworks, LLM application servers.

What is the CVSS score for GHSA-fv5p-p927-qmxr?

GHSA-fv5p-p927-qmxr has a CVSS v3.1 base score of 6.5 (MEDIUM).

What is the AI security impact?

Affected AI Architectures

RAG pipelinesdocument processing pipelinesagent frameworksLLM application servers

MITRE ATLAS Techniques

AML.T0010.001 AI Software

AML.T0025 Exfiltration via Cyber Means

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15

ISO 42001: A.9.2

NIST AI RMF: MANAGE 2.2

OWASP LLM Top 10: LLM02, LLM06

What are the technical details?

Original Advisory

## Summary `HTMLHeaderTextSplitter.split_text_from_url()` validated the initial URL using `validate_safe_url()` but then performed the fetch with `requests.get()` with redirects enabled (the default). Because redirect targets were not revalidated, a URL pointing to an attacker-controlled server could redirect to internal, localhost, or cloud metadata endpoints, bypassing SSRF protections. The response body is parsed and returned as `Document` objects to the calling application code. Whether this constitutes a data exfiltration path depends on the application: if it exposes Document contents (or derivatives) back to the requester who supplied the URL, sensitive data from internal endpoints could be leaked. Applications that store or process Documents internally without returning raw content to the requester are not directly exposed to data exfiltration through this issue. ## Affected versions - `langchain-text-splitters` < 1.1.2 ## Patched versions - `langchain-text-splitters` >= 1.1.2 (requires `langchain-core` >= 1.2.31) ## Affected code **File:** `libs/text-splitters/langchain_text_splitters/html.py` — `split_text_from_url()` The vulnerable pattern validated the URL once then fetched with redirects enabled: ```python validate_safe_url(url, allow_private=False, allow_http=True) response = requests.get(url, timeout=timeout, **kwargs) ``` ## Attack scenario 1. A developer passes external URLs to `split_text_from_url()`, relying on its built-in `validate_safe_url()` check to block requests to internal networks. 2. An attacker supplies a URL pointing to a public host they control. The URL passes `validate_safe_url()` (public hostname, public IP). 3. The attacker's server responds with a `302` redirect to an internal endpoint (e.g., an unauthenticated internal admin API, or a cloud instance metadata service that does not require request headers — such as AWS IMDSv1). 4. `requests.get()` follows the redirect automatically. The redirect target is **not** revalidated. 5. The response body is parsed and returned as `Document` objects to the application. **Notes:** - The core issue is a bypass of an explicitly provided SSRF protection. `split_text_from_url()` included `validate_safe_url()` specifically to be safe with untrusted URLs — the redirect loophole defeated that guarantee. - Cloud metadata endpoints that require special headers (AWS IMDSv2, GCP, Azure) are not reachable through this bug because the attacker does not control request headers. AWS IMDSv1, which requires no headers, is reachable. - Data exfiltration requires the application to return Document contents to the party that supplied the URL. The SSRF itself — forcing the server to issue a request to an internal endpoint — does not require this. ## Fix The fix replaces `requests.get()` with an SSRF-safe httpx transport (`SSRFSafeSyncTransport` from `langchain-core`) that validates DNS results and pins connections to validated IPs on every request, including redirect targets, eliminating redirect-based bypasses. Additionally, `split_text_from_url()` has been deprecated. Users should fetch HTML content themselves and pass it to `split_text()` directly.

Exploitation Scenario

An attacker targets a LangChain-powered document Q&A application that accepts user-submitted URLs for RAG ingestion. They submit a URL pointing to a server they control (e.g., attacker.com/doc). The application calls split_text_from_url() which validates the URL — attacker.com is a public hostname, so validate_safe_url() passes. When LangChain fetches the URL with requests.get(), attacker.com responds with HTTP 302 redirecting to http://169.254.169.254/latest/meta-data/iam/security-credentials/app-role (AWS IMDSv1). LangChain silently follows the redirect, receives the IAM role credentials JSON, and parses the response body as Document objects. If the application surfaces these Documents in its RAG context or API response, the attacker retrieves valid AWS credentials enabling account-level compromise.

Weaknesses (CWE)

CWE-918 Server-Side Request Forgery (SSRF) Primary

CWE-918 — Server-Side Request Forgery (SSRF): The web server receives a URL or similar request from an upstream component and retrieves the contents of this URL, but it does not sufficiently ensure that the request is being sent to the expected destination.

Source: MITRE CWE corpus.