CVE-2026-7482: Ollama heap OOB read leaks API

CISO Take

Ollama before v0.17.1 has a critical heap out-of-bounds read in its GGUF model loader that allows an unauthenticated attacker to extract API keys, environment variables, system prompts, and live user conversation data directly from server heap memory. Both the /api/create and /api/push endpoints carry zero authentication in the upstream distribution, meaning any network-reachable host can trigger this without credentials and then push the artifact — embedded with leaked memory — to an attacker-controlled registry. Although the current EPSS score is low (0.09%), the attack requires no privileges, no user interaction, and minimal technical skill, while real-world exposure is amplified by the widely-documented OLLAMA_HOST=0.0.0.0 configuration that places a significant number of instances on the public internet. Upgrade to v0.17.1 immediately; if patching is not feasible today, firewall port 11434 and place an authenticating reverse proxy in front of /api/create and /api/push.

Sources: NVD EPSS ATLAS GitHub Advisory

What is the risk?

Critical risk for any networked Ollama deployment. The CVSS 9.1 score reflects no authentication barrier, low attack complexity, and high confidentiality and availability impact. The absence of a KEV listing or public exploit lowers near-term mass exploitation probability, but the attack primitive is mechanically trivial — craft a malformed GGUF file and POST it to an open endpoint. Organizations using OLLAMA_HOST=0.0.0.0 (common in homelab, developer, and on-premise AI deployments) face direct internet-scale exposure. The leakable data — API keys, system prompts, concurrent session conversations — represents severe secondary blast radius beyond the immediate host, potentially compromising upstream AI service accounts and violating user privacy obligations.

How does the attack unfold?

Discovery

Attacker scans the internet for port 11434 using Shodan or Censys to identify Ollama instances configured with OLLAMA_HOST=0.0.0.0.

AML.T0006

Exploitation

Attacker crafts a malformed GGUF file with tensor offsets exceeding file length and POSTs it to the unauthenticated /api/create endpoint, triggering a heap OOB read during quantization.

AML.T0049

Collection

Server reads past the heap buffer boundary, capturing adjacent memory containing API keys, environment variables, system prompts, and active user conversation data into the quantized model artifact.

AML.T0055

Exfiltration

Attacker calls the unauthenticated /api/push endpoint to upload the artifact containing leaked heap memory to an attacker-controlled registry, completing silent data theft.

AML.T0025

Discovery

Attacker scans the internet for port 11434 using Shodan or Censys to identify Ollama instances configured with OLLAMA_HOST=0.0.0.0.

AML.T0006

Exploitation

Attacker crafts a malformed GGUF file with tensor offsets exceeding file length and POSTs it to the unauthenticated /api/create endpoint, triggering a heap OOB read during quantization.

AML.T0049

Collection

Server reads past the heap buffer boundary, capturing adjacent memory containing API keys, environment variables, system prompts, and active user conversation data into the quantized model artifact.

AML.T0055

Exfiltration

Attacker calls the unauthenticated /api/push endpoint to upload the artifact containing leaked heap memory to an attacker-controlled registry, completing silent data theft.

AML.T0025

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
Ollama	pip	—	No patch
177.5K 1.7K dependents Pushed 2d ago 10% patched ~42d to patch Full package profile →

Do you use Ollama? You're affected.

How severe is it?

CVSS 3.1

9.1 / 10

EPSS

1.9%

chance of exploitation in 30 days

Higher than 78% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

No known exploitation

Sophistication

Trivial

What is the attack surface?

AV Network

AC Low

PR None

UI None

S Unchanged

C High

I None

A High

What should I do?

5 steps

Patch: Upgrade to Ollama v0.17.1 immediately (https://github.com/ollama/ollama/releases/tag/v0.17.1).
Network isolation: Restrict TCP port 11434 to trusted internal hosts via firewall rules; block all public internet access.
Authentication layer: If immediate patching is not possible, deploy an authenticating reverse proxy (nginx, Caddy, or Traefik with basic auth or OAuth) in front of /api/create and /api/push endpoints.
Detection: Audit access logs for unexpected POST requests to /api/create or /api/push from external IPs; monitor for outbound model push requests to unrecognized registry hosts; review deployed model artifacts for unexpected binary content.
Credential rotation: If exposure is suspected, immediately rotate all API keys and secrets present in the Ollama server environment.

What does CISA's SSVC say?

Decision Track*

Exploitation none

Automatable Yes

Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Data Extraction Data Leakage Auth Bypass Inference Model AML.T0006 - Active Scanning AML.T0011.000 - Unsafe AI Artifacts AML.T0025 - Exfiltration via Cyber Means AML.T0049 - Exploit Public-Facing Application AML.T0055 - Unsecured Credentials

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.9.4 - AI system security

NIST AI RMF

MANAGE 2.2 - Mechanisms to sustain the value of AI system output

OWASP LLM Top 10

LLM02 - Sensitive Information Disclosure LLM07 - System Prompt Leakage

Frequently Asked Questions

What is CVE-2026-7482?

Ollama before v0.17.1 has a critical heap out-of-bounds read in its GGUF model loader that allows an unauthenticated attacker to extract API keys, environment variables, system prompts, and live user conversation data directly from server heap memory. Both the /api/create and /api/push endpoints carry zero authentication in the upstream distribution, meaning any network-reachable host can trigger this without credentials and then push the artifact — embedded with leaked memory — to an attacker-controlled registry. Although the current EPSS score is low (0.09%), the attack requires no privileges, no user interaction, and minimal technical skill, while real-world exposure is amplified by the widely-documented OLLAMA_HOST=0.0.0.0 configuration that places a significant number of instances on the public internet. Upgrade to v0.17.1 immediately; if patching is not feasible today, firewall port 11434 and place an authenticating reverse proxy in front of /api/create and /api/push.

Is CVE-2026-7482 actively exploited?

No confirmed active exploitation of CVE-2026-7482 has been reported, but organizations should still patch proactively.

How to fix CVE-2026-7482?

1. Patch: Upgrade to Ollama v0.17.1 immediately (https://github.com/ollama/ollama/releases/tag/v0.17.1). 2. Network isolation: Restrict TCP port 11434 to trusted internal hosts via firewall rules; block all public internet access. 3. Authentication layer: If immediate patching is not possible, deploy an authenticating reverse proxy (nginx, Caddy, or Traefik with basic auth or OAuth) in front of /api/create and /api/push endpoints. 4. Detection: Audit access logs for unexpected POST requests to /api/create or /api/push from external IPs; monitor for outbound model push requests to unrecognized registry hosts; review deployed model artifacts for unexpected binary content. 5. Credential rotation: If exposure is suspected, immediately rotate all API keys and secrets present in the Ollama server environment.

What systems are affected by CVE-2026-7482?

This vulnerability affects the following AI/ML architecture patterns: LLM inference servers, model serving, local AI deployments, shared inference infrastructure, developer workstations with network-exposed APIs.

What is the CVSS score for CVE-2026-7482?

CVE-2026-7482 has a CVSS v3.1 base score of 9.1 (CRITICAL). The EPSS exploitation probability is 1.93%.

What is the AI security impact?

Affected AI Architectures

LLM inference serversmodel servinglocal AI deploymentsshared inference infrastructuredeveloper workstations with network-exposed APIs

MITRE ATLAS Techniques

AML.T0006 Active Scanning

AML.T0011.000 Unsafe AI Artifacts

AML.T0025 Exfiltration via Cyber Means

AML.T0049 Exploit Public-Facing Application

AML.T0055 Unsecured Credentials

Compliance Controls Affected

EU AI Act: Article 15

ISO 42001: A.9.4

NIST AI RMF: MANAGE 2.2

OWASP LLM Top 10: LLM02, LLM07

What are the technical details?

Original Advisory

Ollama before 0.17.1 contains a heap out-of-bounds read vulnerability in the GGUF model loader. The /api/create endpoint accepts an attacker-supplied GGUF file in which the declared tensor offset and size exceed the file's actual length; during quantization in fs/ggml/gguf.go and server/quantization.go (WriteTo()), the server reads past the allocated heap buffer. The leaked memory contents may include environment variables, API keys, system prompts, and concurrent users' conversation data, and can be exfiltrated by uploading the resulting model artifact through the /api/push endpoint to an attacker-controlled registry. The /api/create and /api/push endpoints have no authentication in the upstream distribution. Default deployments bind to 127.0.0.1, but the documented OLLAMA_HOST=0.0.0.0 configuration is widely used in practice (large public-internet exposure observed).

Exploitation Scenario

An attacker scans the internet for hosts exposing port 11434 (Ollama's default API port) using Shodan or Censys, identifying instances configured with OLLAMA_HOST=0.0.0.0. They craft a GGUF model file where tensor offset and size fields exceed the actual file length, then POST this file to the unauthenticated /api/create endpoint. During quantization in fs/ggml/gguf.go and server/quantization.go, Ollama reads past the allocated heap buffer boundary, capturing adjacent heap memory — including environment variables with API keys, active user conversation content, and system prompt data — into the resulting model artifact. The attacker then calls the equally unauthenticated /api/push endpoint, specifying an attacker-controlled registry URL, to silently exfiltrate the artifact containing the leaked memory dump. The entire operation requires no credentials, no user interaction, and no elevated access.

Weaknesses (CWE)

CWE-125 Out-of-bounds Read

CWE-125 — Out-of-bounds Read: The product reads data past the end, or before the beginning, of the intended buffer.

[Implementation] Assume all input is malicious. Use an "accept known good" input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does. When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, "boat" may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as "red" or "blue." Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code's environment changes. This can give attackers enough room to bypass the intended validation. However, denylis
[Architecture and Design] Use a language that provides appropriate memory abstractions.

Source: MITRE CWE corpus.