CVE-2026-56340: vLLM: sparse tensor DoS/memory corruption via embeddings
HIGHvLLM versions 0.10.2 through 0.12.x fail to validate sparse tensor indices in multimodal embedding requests, allowing any authenticated user to crash the inference server or potentially corrupt process memory via crafted negative or out-of-bounds indices when the prompt-embeds feature is enabled. With a CVSS of 8.8, network-reachable vector, and low-privilege requirement, exploitation requires minimal skill — an attacker with a valid API token can reliably trigger service disruption against production LLM serving infrastructure. This is particularly concerning as it continues CVE-2025-62164, whose prior remediation merely disabled the feature by default rather than fixing the root validation flaw, meaning any deployment that re-enabled prompt-embeds for multimodal workflows remains exposed. Upgrade to vLLM 0.13.0 or later immediately; as an interim workaround, confirm prompt-embeds is explicitly disabled in your serving configuration and restrict inference endpoint access to trusted clients only.
What is the risk?
High risk for organizations running vLLM in production LLM serving environments. The CVSS 8.8 score reflects network accessibility with low attack complexity and low privilege requirements — a combination that makes exploitation straightforward for any user with API access. The potential escalation from DoS to write-what-where memory corruption elevates this beyond a simple availability issue, introducing integrity and confidentiality risks. PyTorch's default behavior of disabling sparse tensor invariant checks amplifies the attack surface, as the validation gap is architectural rather than incidental. No public exploits or CISA KEV listing at this time, but the low exploitation barrier and wide deployment of vLLM in production inference stacks warrant urgent remediation — particularly for multi-tenant or externally-exposed inference endpoints.
How does the attack unfold?
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| vLLM | pip | — | No patch |
Do you use vLLM? You're affected.
How severe is it?
What is the attack surface?
What should I do?
5 steps-
Patch: Upgrade vLLM to version 0.13.0 or later — the definitive fix addressing the root input validation flaw.
-
Workaround: If immediate patching is not possible, explicitly disable the prompt-embeds feature in your vLLM serving configuration; do not rely solely on the CVE-2025-62164 default-disable behavior, as configuration drift or intentional re-enabling for multimodal workflows is common.
-
Network controls: Restrict inference API endpoints to authenticated, trusted clients only; implement rate limiting on embedding submission endpoints.
-
Detection: Monitor vLLM worker processes for unexpected crashes or OOM events; log malformed or anomalously structured embedding requests and alert on patterns.
-
Audit: Inventory all deployments that may have re-enabled prompt-embeds after CVE-2025-62164 — those systems have been exposed for the full affected version range.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2026-56340?
vLLM versions 0.10.2 through 0.12.x fail to validate sparse tensor indices in multimodal embedding requests, allowing any authenticated user to crash the inference server or potentially corrupt process memory via crafted negative or out-of-bounds indices when the prompt-embeds feature is enabled. With a CVSS of 8.8, network-reachable vector, and low-privilege requirement, exploitation requires minimal skill — an attacker with a valid API token can reliably trigger service disruption against production LLM serving infrastructure. This is particularly concerning as it continues CVE-2025-62164, whose prior remediation merely disabled the feature by default rather than fixing the root validation flaw, meaning any deployment that re-enabled prompt-embeds for multimodal workflows remains exposed. Upgrade to vLLM 0.13.0 or later immediately; as an interim workaround, confirm prompt-embeds is explicitly disabled in your serving configuration and restrict inference endpoint access to trusted clients only.
Is CVE-2026-56340 actively exploited?
No confirmed active exploitation of CVE-2026-56340 has been reported, but organizations should still patch proactively.
How to fix CVE-2026-56340?
1. Patch: Upgrade vLLM to version 0.13.0 or later — the definitive fix addressing the root input validation flaw. 2. Workaround: If immediate patching is not possible, explicitly disable the prompt-embeds feature in your vLLM serving configuration; do not rely solely on the CVE-2025-62164 default-disable behavior, as configuration drift or intentional re-enabling for multimodal workflows is common. 3. Network controls: Restrict inference API endpoints to authenticated, trusted clients only; implement rate limiting on embedding submission endpoints. 4. Detection: Monitor vLLM worker processes for unexpected crashes or OOM events; log malformed or anomalously structured embedding requests and alert on patterns. 5. Audit: Inventory all deployments that may have re-enabled prompt-embeds after CVE-2025-62164 — those systems have been exposed for the full affected version range.
What systems are affected by CVE-2026-56340?
This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, Multimodal AI pipelines, Model serving infrastructure, API-exposed inference endpoints, RAG pipelines with custom embeddings.
What is the CVSS score for CVE-2026-56340?
CVE-2026-56340 has a CVSS v3.1 base score of 8.8 (HIGH).
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0029 Denial of AI Service AML.T0034.001 Resource-Intensive Queries AML.T0040 AI Model Inference API Access AML.T0043 Craft Adversarial Data AML.T0049 Exploit Public-Facing Application Compliance Controls Affected
What are the technical details?
Original Advisory
vLLM versions >= 0.10.2 and < 0.13.0 are missing sparse tensor validation in multimodal embeddings processing. Because PyTorch disables sparse tensor invariant checks by default, an attacker can submit crafted embedding requests with malformed (negative or out-of-bounds) tensor indices, when the prompt-embeds feature is enabled, to trigger crashes or resource exhaustion (denial of service), with potential for out-of-bounds/write-what-where memory corruption. This continues CVE-2025-62164, whose prior fix only disabled the feature by default rather than addressing the root cause.
Exploitation Scenario
An attacker with a low-privilege API token to a vLLM inference endpoint identifies that the deployment runs vLLM 0.10.2+ with prompt-embeds enabled for multimodal workflows. Referencing the public GitHub advisory and knowledge of PyTorch's sparse tensor internals, the attacker crafts a multimodal embedding request containing sparse tensor data with deliberately negative or out-of-bounds indices. Because vLLM does not validate these indices before passing them to PyTorch, and PyTorch disables sparse tensor invariant checks by default, the malformed tensor propagates into processing. At minimum this crashes the vLLM worker process, causing inference service downtime for all users. With further refinement — targeting specific memory offsets via the write-what-where primitive — the attacker could corrupt the inference process's heap to achieve code execution within the vLLM container, gaining access to loaded model weights, in-memory inference data, or downstream service credentials.
Weaknesses (CWE)
CWE-20 — Improper Input Validation: The product receives input or data, but it does not validate or incorrectly validates that the input has the properties that are required to process the data safely and correctly.
- [Architecture and Design] Consider using language-theoretic security (LangSec) techniques that characterize inputs using a formal language and build "recognizers" for that language. This effectively requires parsing to be a distinct layer that effectively enforces a boundary between raw input and internal data representations, instead of allowing parser code to be scattered throughout the program, where it could be subject to errors or inconsistencies that create weaknesses. [REF-1109] [REF-1110] [REF-1111]
- [Architecture and Design] Use an input validation framework such as Struts or the OWASP ESAPI Validation API. Note that using a framework does not automatically address all input validation problems; be mindful of weaknesses that could arise from misusing the framework itself (CWE-1173).
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H References
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm