CVE-2025-46722: vLLM: image hash collision enables multimodal cache leakage

GHSA-c65p-x677-fgj6 HIGH
Published May 29, 2025
CISO Take

If your organization runs vLLM for multimodal LLM inference (vision models), upgrade to 0.9.0 immediately. Crafted images can collide in the multimodal cache, causing inference results intended for one user to be silently returned to another — a cross-tenant data leakage risk requiring no authentication. A patch is available and there are no known workarounds short of disabling multimodal caching.

What is the risk?

High risk for organizations running vLLM in multi-tenant or publicly exposed multimodal inference environments. CVSS 7.3 with a network-accessible, no-auth, no-user-interaction vector means any API client submitting images can potentially trigger collisions. EPSS is currently very low (0.09%), indicating no active exploitation at time of publication, but the attack primitive is conceptually straightforward once the bug is understood. Patch availability reduces urgency slightly, but the ease of exploitation and breadth of vLLM deployments warrant prompt action.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
vLLM pip No patch
83.4K 130 dependents Pushed 3d ago 34% patched ~32d to patch Full package profile →
vLLM pip >= 0.7.0, < 0.9.0 0.9.0
83.4K 130 dependents Pushed 3d ago 34% patched ~32d to patch Full package profile →

How severe is it?

CVSS 3.1
7.3 / 10
EPSS
0.3%
chance of exploitation in 30 days
Higher than 18% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Moderate

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C Low
I Low
A Low

What should I do?

6 steps
  1. Upgrade vLLM to >= 0.9.0 immediately — patch is in commit 99404f53.

  2. Verify current version: pip show vllm; any 0.7.0–0.8.x is vulnerable.

  3. If immediate upgrade is blocked, disable multimodal prefix caching in vLLM configuration (--disable-prefix-caching flag) as a temporary workaround.

  4. In multi-tenant deployments, implement per-tenant cache namespace isolation at the application layer until patched.

  5. Review inference logs for anomalous cache hit rates on multimodal endpoints as a post-hoc indicator of exploitation.

  6. If running vLLM behind an API gateway, consider adding request fingerprinting that includes image dimensions as a compensating control.

What does CISA's SSVC say?

Decision Track
Exploitation none
Automatable No
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, Robustness and Cybersecurity
ISO 42001
8.4 - AI System Operation — Data Integrity
NIST AI RMF
MANAGE-2.2 - Risk Treatment and Residual Risk
OWASP LLM Top 10
LLM02:2025 - Sensitive Information Disclosure

Frequently Asked Questions

What is CVE-2025-46722?

If your organization runs vLLM for multimodal LLM inference (vision models), upgrade to 0.9.0 immediately. Crafted images can collide in the multimodal cache, causing inference results intended for one user to be silently returned to another — a cross-tenant data leakage risk requiring no authentication. A patch is available and there are no known workarounds short of disabling multimodal caching.

Is CVE-2025-46722 actively exploited?

No confirmed active exploitation of CVE-2025-46722 has been reported, but organizations should still patch proactively.

How to fix CVE-2025-46722?

1. Upgrade vLLM to >= 0.9.0 immediately — patch is in commit 99404f53. 2. Verify current version: `pip show vllm`; any 0.7.0–0.8.x is vulnerable. 3. If immediate upgrade is blocked, disable multimodal prefix caching in vLLM configuration (--disable-prefix-caching flag) as a temporary workaround. 4. In multi-tenant deployments, implement per-tenant cache namespace isolation at the application layer until patched. 5. Review inference logs for anomalous cache hit rates on multimodal endpoints as a post-hoc indicator of exploitation. 6. If running vLLM behind an API gateway, consider adding request fingerprinting that includes image dimensions as a compensating control.

What systems are affected by CVE-2025-46722?

This vulnerability affects the following AI/ML architecture patterns: Multimodal inference serving, Vision-language model deployments, Multi-tenant LLM APIs, LLM serving infrastructure.

What is the CVSS score for CVE-2025-46722?

CVE-2025-46722 has a CVSS v3.1 base score of 7.3 (HIGH). The EPSS exploitation probability is 0.27%.

What is the AI security impact?

Affected AI Architectures

Multimodal inference servingVision-language model deploymentsMulti-tenant LLM APIsLLM serving infrastructure

MITRE ATLAS Techniques

AML.T0031 Erode AI Model Integrity
AML.T0040 AI Model Inference API Access
AML.T0049 Exploit Public-Facing Application
AML.T0057 LLM Data Leakage

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: 8.4
NIST AI RMF: MANAGE-2.2
OWASP LLM Top 10: LLM02:2025

What are the technical details?

Original Advisory

vLLM is an inference and serving engine for large language models (LLMs). In versions starting from 0.7.0 to before 0.9.0, in the file vllm/multimodal/hasher.py, the MultiModalHasher class has a security and data integrity issue in its image hashing method. Currently, it serializes PIL.Image.Image objects using only obj.tobytes(), which returns only the raw pixel data, without including metadata such as the image’s shape (width, height, mode). As a result, two images of different sizes (e.g., 30x100 and 100x30) with the same pixel byte sequence could generate the same hash value. This may lead to hash collisions, incorrect cache hits, and even data leakage or security risks. This issue has been patched in version 0.9.0.

Exploitation Scenario

An adversary with access to a multi-tenant vLLM multimodal API submits a 100x30 image with specific pixel data. If a prior user's 30x100 image produced identical raw bytes, the attacker receives that cached inference result — potentially revealing the prior user's prompt context, processed image interpretation, or model output. In a production API serving multiple tenants, the attacker systematically probes with geometrically transposed versions of common images (landscape vs portrait variants of known content), fishing for hash collisions that surface other tenants' inference results. No authentication bypass, elevated privileges, or specialized AI knowledge is required — only knowledge of the bug and basic image manipulation tooling.

Weaknesses (CWE)

CWE-1023 — Incomplete Comparison with Missing Factors: The product performs a comparison between entities that must consider multiple factors or characteristics of each entity, but the comparison does not include one or more of these factors.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:L/A:L

Timeline

Published
May 29, 2025
Last Modified
June 24, 2025
First Seen
May 29, 2025

Related Vulnerabilities