CVE-2026-7141: vLLM uninitialized KV cache memory

CISO Take

CVE-2026-7141 is an uninitialized resource vulnerability (CWE-908) in vllm's KV block handler for Mamba-architecture models, remotely exploitable without authentication on all versions prior to 0.19.1. With 126 downstream dependents and an EPSS placing it in the top 79th percentile of likely-exploited vulnerabilities, the exposure surface is meaningful for organizations running LLM inference infrastructure with Mamba or hybrid Mamba-Transformer model support. SSVC assessment of TRACK_STAR and absence from CISA KEV indicate no confirmed active exploitation, and high attack complexity (AC:H) raises the practical bar for successful exploitation — but the network-accessible attack vector and zero-privilege requirement on multi-tenant inference endpoints make this a legitimate data leakage risk. Upgrade to vllm 0.19.1 (patch commit 1ad67864) immediately; if patching is not feasible, restrict the inference API to authenticated internal networks and audit all Mamba model deployments.

Sources: NVD EPSS GitHub Advisory ATLAS CISA KEV

What is the risk?

Medium risk overall. CVSS 5.6 is anchored by High Attack Complexity, which significantly reduces the practical exploitation window — an attacker must understand vllm internals and specifically exercise the Mamba layer KV cache allocation path to trigger the flaw reliably. The zero-privilege, no-user-interaction, network-accessible profile is concerning for publicly exposed inference endpoints, but the uniformly Low CIA triad (C:L/I:L/A:L) caps per-incident blast radius. The 126 downstream dependents and 42 prior CVEs in the same package indicate a systemic risk posture in the vllm supply chain that warrants sustained attention beyond this individual CVE.

How does the attack unfold?

Reconnaissance

Adversary scans for publicly accessible vllm inference endpoints and identifies instances running Mamba or hybrid Mamba-Transformer models on versions prior to 0.19.1.

AML.T0006

Initial Access

Adversary sends unauthenticated HTTP requests to the vllm OpenAI-compatible inference API, probing endpoints to confirm Mamba model availability and vulnerable version.

AML.T0049

Exploitation

Crafted inference requests exercise the `has_mamba_layers` function in kv_cache_interface.py, triggering allocation of an uninitialized KV cache block containing residual heap memory from prior allocations.

AML.T0040

Impact

Residual memory in the uninitialized KV cache block exposes fragments of prior inference sessions (partial prompts or outputs), enabling cross-tenant data leakage in multi-tenant deployments and potential KV cache corruption causing incorrect model outputs.

AML.T0024

Reconnaissance

Adversary scans for publicly accessible vllm inference endpoints and identifies instances running Mamba or hybrid Mamba-Transformer models on versions prior to 0.19.1.

AML.T0006

Initial Access

Adversary sends unauthenticated HTTP requests to the vllm OpenAI-compatible inference API, probing endpoints to confirm Mamba model availability and vulnerable version.

AML.T0049

Exploitation

Crafted inference requests exercise the `has_mamba_layers` function in kv_cache_interface.py, triggering allocation of an uninitialized KV cache block containing residual heap memory from prior allocations.

AML.T0040

Impact

Residual memory in the uninitialized KV cache block exposes fragments of prior inference sessions (partial prompts or outputs), enabling cross-tenant data leakage in multi-tenant deployments and potential KV cache corruption causing incorrect model outputs.

AML.T0024

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
vLLM	pip	< 0.19.1	`0.19.1`
83.4K 130 dependents Pushed 2d ago 34% patched ~32d to patch Full package profile →

Do you use vLLM? You're affected.

How severe is it?

CVSS 3.1

5.6 / 10

EPSS

0.3%

chance of exploitation in 30 days

Higher than 20% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Moderate

Exploitation Confidence

medium

○ CISA SSVC: Public PoC

Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV Network

AC High

PR None

UI None

S Unchanged

C Low

I Low

A Low

What should I do?

5 steps

Patch immediately: upgrade vllm to >= 0.19.1 (patch commit 1ad67864c0c20f167929e64c875f5c28e1aad9fd). Verify with pip show vllm | grep Version.
Workaround if patching is blocked: restrict the vllm inference API (default port 8000) to authenticated internal traffic via an API gateway or network perimeter control — do not expose to the public internet on vulnerable versions.
Detection: monitor inference API logs for anomalous error patterns in KV cache operations or exceptions in the Mamba layer handling path; alert on unexpected memory-related errors in vllm server logs.
Dependency audit: run pip show vllm across all inference nodes and CI/CD environments; scan the full dependency graph for indirect vllm consumers among the 126 known downstream packages.
Prioritize: focus remediation on multi-tenant or customer-facing inference endpoints before internal/single-tenant deployments.

What does CISA's SSVC say?

Decision Track*

Exploitation poc

Automatable No

Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Data Leakage DoS Inference Framework AML.T0010.001 - AI Software AML.T0024 - Exfiltration via AI Inference API AML.T0040 - AI Model Inference API Access AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.9.4 - Information security for AI systems

NIST AI RMF

MANAGE 2.2 - Mechanisms are in place to sustain the value of deployed AI systems

OWASP LLM Top 10

LLM03 - Supply Chain

Frequently Asked Questions

What is CVE-2026-7141?

CVE-2026-7141 is an uninitialized resource vulnerability (CWE-908) in vllm's KV block handler for Mamba-architecture models, remotely exploitable without authentication on all versions prior to 0.19.1. With 126 downstream dependents and an EPSS placing it in the top 79th percentile of likely-exploited vulnerabilities, the exposure surface is meaningful for organizations running LLM inference infrastructure with Mamba or hybrid Mamba-Transformer model support. SSVC assessment of TRACK_STAR and absence from CISA KEV indicate no confirmed active exploitation, and high attack complexity (AC:H) raises the practical bar for successful exploitation — but the network-accessible attack vector and zero-privilege requirement on multi-tenant inference endpoints make this a legitimate data leakage risk. Upgrade to vllm 0.19.1 (patch commit 1ad67864) immediately; if patching is not feasible, restrict the inference API to authenticated internal networks and audit all Mamba model deployments.

Is CVE-2026-7141 actively exploited?

No confirmed active exploitation of CVE-2026-7141 has been reported, but organizations should still patch proactively.

How to fix CVE-2026-7141?

1. Patch immediately: upgrade vllm to >= 0.19.1 (patch commit 1ad67864c0c20f167929e64c875f5c28e1aad9fd). Verify with `pip show vllm | grep Version`. 2. Workaround if patching is blocked: restrict the vllm inference API (default port 8000) to authenticated internal traffic via an API gateway or network perimeter control — do not expose to the public internet on vulnerable versions. 3. Detection: monitor inference API logs for anomalous error patterns in KV cache operations or exceptions in the Mamba layer handling path; alert on unexpected memory-related errors in vllm server logs. 4. Dependency audit: run `pip show vllm` across all inference nodes and CI/CD environments; scan the full dependency graph for indirect vllm consumers among the 126 known downstream packages. 5. Prioritize: focus remediation on multi-tenant or customer-facing inference endpoints before internal/single-tenant deployments.

What systems are affected by CVE-2026-7141?

This vulnerability affects the following AI/ML architecture patterns: LLM inference servers, Multi-tenant model serving, Mamba and SSM model deployments, Hybrid Transformer-Mamba architectures, OpenAI-compatible API proxies.

What is the CVSS score for CVE-2026-7141?

CVE-2026-7141 has a CVSS v3.1 base score of 5.6 (MEDIUM). The EPSS exploitation probability is 0.29%.

What is the AI security impact?

Affected AI Architectures

LLM inference serversMulti-tenant model servingMamba and SSM model deploymentsHybrid Transformer-Mamba architecturesOpenAI-compatible API proxies

MITRE ATLAS Techniques

AML.T0010.001 AI Software

AML.T0024 Exfiltration via AI Inference API

AML.T0040 AI Model Inference API Access

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15

ISO 42001: A.9.4

NIST AI RMF: MANAGE 2.2

OWASP LLM Top 10: LLM03

What are the technical details?

Original Advisory

A vulnerability was found in vllm up to 0.19.0. The affected element is the function has_mamba_layers of the file vllm/v1/kv_cache_interface.py of the component KV Block Handler. Performing a manipulation results in uninitialized resource. It is possible to initiate the attack remotely. The attack is considered to have high complexity. The exploitability is described as difficult. The exploit has been made public and could be used. The patch is named 1ad67864c0c20f167929e64c875f5c28e1aad9fd. To fix this issue, it is recommended to deploy a patch.

Exploitation Scenario

An adversary identifies a publicly accessible vllm inference endpoint (e.g., OpenAI-compatible API on port 8000) running a Jamba or Mamba2 model. With knowledge of vllm's internal KV cache architecture, they craft inference requests specifically designed to exercise the `has_mamba_layers` code path in `vllm/v1/kv_cache_interface.py`, triggering allocation of a KV cache block without proper initialization. The uninitialized block retains residual heap data from previous allocations, which may include fragments of prior inference sessions from other tenants — partial user prompts, intermediate model outputs, or other sensitive context. The adversary repeats this technique across multiple requests to reconstruct a broader picture of prior inference activity. Additionally, corrupting the KV cache state through integrity manipulation (I:L) can cause the model to produce incorrect outputs for other concurrent users, eroding trust in the service.

Weaknesses (CWE)

CWE-908 Use of Uninitialized Resource Primary CWE-908 Use of Uninitialized Resource Primary

CWE-908 — Use of Uninitialized Resource: The product uses or accesses a resource that has not been initialized.

[Implementation] Explicitly initialize the resource before use. If this is performed through an API function or standard procedure, follow all required steps.
[Implementation] Pay close attention to complex conditionals that affect initialization, since some branches might not perform the initialization.

Source: MITRE CWE corpus.