CVE-2026-56340: vLLM sparse tensor DoS/memory

CISO Take

vLLM versions 0.10.2 through 0.12.x fail to validate sparse tensor indices in multimodal embedding requests, allowing any authenticated user to crash the inference server or potentially corrupt process memory via crafted negative or out-of-bounds indices when the prompt-embeds feature is enabled. With a CVSS of 8.8, network-reachable vector, and low-privilege requirement, exploitation requires minimal skill — an attacker with a valid API token can reliably trigger service disruption against production LLM serving infrastructure. This is particularly concerning as it continues CVE-2025-62164, whose prior remediation merely disabled the feature by default rather than fixing the root validation flaw, meaning any deployment that re-enabled prompt-embeds for multimodal workflows remains exposed. Upgrade to vLLM 0.13.0 or later immediately; as an interim workaround, confirm prompt-embeds is explicitly disabled in your serving configuration and restrict inference endpoint access to trusted clients only.

Sources: NVD GitHub Advisory ATLAS

What is the risk?

High risk for organizations running vLLM in production LLM serving environments. The CVSS 8.8 score reflects network accessibility with low attack complexity and low privilege requirements — a combination that makes exploitation straightforward for any user with API access. The potential escalation from DoS to write-what-where memory corruption elevates this beyond a simple availability issue, introducing integrity and confidentiality risks. PyTorch's default behavior of disabling sparse tensor invariant checks amplifies the attack surface, as the validation gap is architectural rather than incidental. No public exploits or CISA KEV listing at this time, but the low exploitation barrier and wide deployment of vLLM in production inference stacks warrant urgent remediation — particularly for multi-tenant or externally-exposed inference endpoints.

How does the attack unfold?

Initial Access

Attacker obtains low-privilege API credentials to a vLLM inference endpoint serving multimodal models with the prompt-embeds feature enabled.

AML.T0040

Payload Crafting

Attacker constructs a multimodal embedding request containing sparse tensor data with deliberately malformed negative or out-of-bounds indices, leveraging knowledge that PyTorch disables sparse tensor invariant checks by default.

AML.T0043

Exploitation

vLLM processes the malformed embedding request without index validation, causing worker process crash or heap corruption via the unvalidated write-what-where primitive in sparse tensor operations.

AML.T0049

Impact

Inference service crashes causing denial of service for all users, with potential escalation to process memory corruption enabling model weight exfiltration or arbitrary code execution within the vLLM container.

AML.T0029

Initial Access

Attacker obtains low-privilege API credentials to a vLLM inference endpoint serving multimodal models with the prompt-embeds feature enabled.

AML.T0040

Payload Crafting

Attacker constructs a multimodal embedding request containing sparse tensor data with deliberately malformed negative or out-of-bounds indices, leveraging knowledge that PyTorch disables sparse tensor invariant checks by default.

AML.T0043

Exploitation

vLLM processes the malformed embedding request without index validation, causing worker process crash or heap corruption via the unvalidated write-what-where primitive in sparse tensor operations.

AML.T0049

Impact

Inference service crashes causing denial of service for all users, with potential escalation to process memory corruption enabling model weight exfiltration or arbitrary code execution within the vLLM container.

AML.T0029

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
vLLM	pip	—	No patch
82.8K 130 dependents Pushed 6d ago 34% patched ~30d to patch Full package profile →

Do you use vLLM? You're affected.

How severe is it?

CVSS 3.1

8.8 / 10

EPSS

N/A

Exploitation Status

No known exploitation

Sophistication

Moderate

What is the attack surface?

AV Network

AC Low

PR Low

UI None

S Unchanged

C High

I High

A High

What should I do?

5 steps

Patch: Upgrade vLLM to version 0.13.0 or later — the definitive fix addressing the root input validation flaw.
Workaround: If immediate patching is not possible, explicitly disable the prompt-embeds feature in your vLLM serving configuration; do not rely solely on the CVE-2025-62164 default-disable behavior, as configuration drift or intentional re-enabling for multimodal workflows is common.
Network controls: Restrict inference API endpoints to authenticated, trusted clients only; implement rate limiting on embedding submission endpoints.
Detection: Monitor vLLM worker processes for unexpected crashes or OOM events; log malformed or anomalously structured embedding requests and alert on patterns.
Audit: Inventory all deployments that may have re-enabled prompt-embeds after CVE-2025-62164 — those systems have been exposed for the full affected version range.

How is it classified?

DoS Code Execution Inference API AML.T0029 - Denial of AI Service AML.T0034.001 - Resource-Intensive Queries AML.T0040 - AI Model Inference API Access AML.T0043 - Craft Adversarial Data AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Article 9 - Risk management system

ISO 42001

8.4 - AI system operation and monitoring

NIST AI RMF

MANAGE 2.2 - Mechanisms to sustain AI system robustness

OWASP LLM Top 10

LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2026-56340?

vLLM versions 0.10.2 through 0.12.x fail to validate sparse tensor indices in multimodal embedding requests, allowing any authenticated user to crash the inference server or potentially corrupt process memory via crafted negative or out-of-bounds indices when the prompt-embeds feature is enabled. With a CVSS of 8.8, network-reachable vector, and low-privilege requirement, exploitation requires minimal skill — an attacker with a valid API token can reliably trigger service disruption against production LLM serving infrastructure. This is particularly concerning as it continues CVE-2025-62164, whose prior remediation merely disabled the feature by default rather than fixing the root validation flaw, meaning any deployment that re-enabled prompt-embeds for multimodal workflows remains exposed. Upgrade to vLLM 0.13.0 or later immediately; as an interim workaround, confirm prompt-embeds is explicitly disabled in your serving configuration and restrict inference endpoint access to trusted clients only.

Is CVE-2026-56340 actively exploited?

No confirmed active exploitation of CVE-2026-56340 has been reported, but organizations should still patch proactively.

How to fix CVE-2026-56340?

1. Patch: Upgrade vLLM to version 0.13.0 or later — the definitive fix addressing the root input validation flaw. 2. Workaround: If immediate patching is not possible, explicitly disable the prompt-embeds feature in your vLLM serving configuration; do not rely solely on the CVE-2025-62164 default-disable behavior, as configuration drift or intentional re-enabling for multimodal workflows is common. 3. Network controls: Restrict inference API endpoints to authenticated, trusted clients only; implement rate limiting on embedding submission endpoints. 4. Detection: Monitor vLLM worker processes for unexpected crashes or OOM events; log malformed or anomalously structured embedding requests and alert on patterns. 5. Audit: Inventory all deployments that may have re-enabled prompt-embeds after CVE-2025-62164 — those systems have been exposed for the full affected version range.

What systems are affected by CVE-2026-56340?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, Multimodal AI pipelines, Model serving infrastructure, API-exposed inference endpoints, RAG pipelines with custom embeddings.

What is the CVSS score for CVE-2026-56340?

CVE-2026-56340 has a CVSS v3.1 base score of 8.8 (HIGH).

What is the AI security impact?

Affected AI Architectures

LLM inference servingMultimodal AI pipelinesModel serving infrastructureAPI-exposed inference endpointsRAG pipelines with custom embeddings

MITRE ATLAS Techniques

AML.T0029 Denial of AI Service

AML.T0034.001 Resource-Intensive Queries

AML.T0040 AI Model Inference API Access

AML.T0043 Craft Adversarial Data

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 9

ISO 42001: 8.4

NIST AI RMF: MANAGE 2.2

OWASP LLM Top 10: LLM04

What are the technical details?

Original Advisory

vLLM versions >= 0.10.2 and < 0.13.0 are missing sparse tensor validation in multimodal embeddings processing. Because PyTorch disables sparse tensor invariant checks by default, an attacker can submit crafted embedding requests with malformed (negative or out-of-bounds) tensor indices, when the prompt-embeds feature is enabled, to trigger crashes or resource exhaustion (denial of service), with potential for out-of-bounds/write-what-where memory corruption. This continues CVE-2025-62164, whose prior fix only disabled the feature by default rather than addressing the root cause.

Exploitation Scenario

An attacker with a low-privilege API token to a vLLM inference endpoint identifies that the deployment runs vLLM 0.10.2+ with prompt-embeds enabled for multimodal workflows. Referencing the public GitHub advisory and knowledge of PyTorch's sparse tensor internals, the attacker crafts a multimodal embedding request containing sparse tensor data with deliberately negative or out-of-bounds indices. Because vLLM does not validate these indices before passing them to PyTorch, and PyTorch disables sparse tensor invariant checks by default, the malformed tensor propagates into processing. At minimum this crashes the vLLM worker process, causing inference service downtime for all users. With further refinement — targeting specific memory offsets via the write-what-where primitive — the attacker could corrupt the inference process's heap to achieve code execution within the vLLM container, gaining access to loaded model weights, in-memory inference data, or downstream service credentials.

Weaknesses (CWE)

CWE-20 Improper Input Validation Primary

CWE-20 — Improper Input Validation: The product receives input or data, but it does not validate or incorrectly validates that the input has the properties that are required to process the data safely and correctly.

[Architecture and Design] Consider using language-theoretic security (LangSec) techniques that characterize inputs using a formal language and build "recognizers" for that language. This effectively requires parsing to be a distinct layer that effectively enforces a boundary between raw input and internal data representations, instead of allowing parser code to be scattered throughout the program, where it could be subject to errors or inconsistencies that create weaknesses. [REF-1109] [REF-1110] [REF-1111]
[Architecture and Design] Use an input validation framework such as Struts or the OWASP ESAPI Validation API. Note that using a framework does not automatically address all input validation problems; be mindful of weaknesses that could arise from misusing the framework itself (CWE-1173).

Source: MITRE CWE corpus.