CVE-2025-25183: vLLM: hash collision enables prefix cache poisoning

GHSA-rm76-4mrf-v9r8 LOW
Published February 7, 2025
CISO Take

vLLM deployments on Python 3.12+ using prefix caching are vulnerable to cache poisoning via crafted hash collisions, causing users to receive responses generated from attacker-controlled prompt contexts. The attack requires low-privilege API access and knowledge of target prompts, limiting real-world risk—but multi-tenant shared inference infrastructure faces meaningful integrity exposure. Upgrade all vLLM instances to 0.7.2 immediately; if patching is delayed, disable prefix caching as a stopgap.

Risk Assessment

Low operational risk (CVSS 2.6, EPSS 0.36%) but notable as a concrete AI inference integrity attack vector. High attack complexity and required user interaction significantly constrain exploitability. Python 3.12's predictable hash(None) behavior eliminates hash randomization for None values, increasing feasibility of deliberate collisions compared to earlier Python versions. Risk escalates in multi-tenant vLLM deployments where multiple organizations or users share inference infrastructure—here a cache integrity failure carries downstream compliance and trust implications beyond its raw CVSS score.

Affected Systems

Package Ecosystem Vulnerable Range Patched
vllm pip No patch
79.5K 127 dependents Pushed today 56% patched ~32d to patch Full package profile →
vllm pip < 0.7.2 0.7.2
79.5K 127 dependents Pushed today 56% patched ~32d to patch Full package profile →

Severity & Risk

CVSS 3.1
2.6 / 10
EPSS
0.3%
chance of exploitation in 30 days
Higher than 55% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Moderate

Attack Surface

AV AC PR UI S C I A
AV Network
AC High
PR Low
UI Required
S Unchanged
C None
I Low
A None

Recommended Action

6 steps
  1. Upgrade vLLM to >= 0.7.2 (patch available, breaking change is minimal).

  2. If immediate patching is blocked, disable prefix caching via --disable-prefix-caching flag or equivalent config key—this eliminates the attack surface at the cost of cache performance.

  3. Audit Python version across inference fleet; Python 3.12+ deployments carry elevated risk due to predictable hash(None).

  4. For multi-tenant deployments, consider tenant-isolated vLLM instances until patch is applied.

  5. Monitor inference logs for anomalous response inconsistencies or unexpected prompt-to-output mismatches that may indicate cache pollution.

  6. Review API access logs for unusual high-volume or repetitive request patterns suggesting hash-collision probing.

CISA SSVC Assessment

Decision Track
Exploitation none
Automatable No
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Art.15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2.3 - AI system data quality
NIST AI RMF
MANAGE-2.2 - Mechanisms to sustain value and manage AI risks
OWASP LLM Top 10
LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2025-25183?

vLLM deployments on Python 3.12+ using prefix caching are vulnerable to cache poisoning via crafted hash collisions, causing users to receive responses generated from attacker-controlled prompt contexts. The attack requires low-privilege API access and knowledge of target prompts, limiting real-world risk—but multi-tenant shared inference infrastructure faces meaningful integrity exposure. Upgrade all vLLM instances to 0.7.2 immediately; if patching is delayed, disable prefix caching as a stopgap.

Is CVE-2025-25183 actively exploited?

No confirmed active exploitation of CVE-2025-25183 has been reported, but organizations should still patch proactively.

How to fix CVE-2025-25183?

1. Upgrade vLLM to >= 0.7.2 (patch available, breaking change is minimal). 2. If immediate patching is blocked, disable prefix caching via --disable-prefix-caching flag or equivalent config key—this eliminates the attack surface at the cost of cache performance. 3. Audit Python version across inference fleet; Python 3.12+ deployments carry elevated risk due to predictable hash(None). 4. For multi-tenant deployments, consider tenant-isolated vLLM instances until patch is applied. 5. Monitor inference logs for anomalous response inconsistencies or unexpected prompt-to-output mismatches that may indicate cache pollution. 6. Review API access logs for unusual high-volume or repetitive request patterns suggesting hash-collision probing.

What systems are affected by CVE-2025-25183?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving (vLLM), Multi-tenant model serving, KV cache / prefix caching systems, RAG pipelines with vLLM backend, AI agent frameworks using vLLM as inference engine.

What is the CVSS score for CVE-2025-25183?

CVE-2025-25183 has a CVSS v3.1 base score of 2.6 (LOW). The EPSS exploitation probability is 0.32%.

Technical Details

NVD Description

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Maliciously constructed statements can lead to hash collisions, resulting in cache reuse, which can interfere with subsequent responses and cause unintended behavior. Prefix caching makes use of Python's built-in hash() function. As of Python 3.12, the behavior of hash(None) has changed to be a predictable constant value. This makes it more feasible that someone could try exploit hash collisions. The impact of a collision would be using cache that was generated using different content. Given knowledge of prompts in use and predictable hashing behavior, someone could intentionally populate the cache using a prompt known to collide with another prompt in use. This issue has been addressed in version 0.7.2 and all users are advised to upgrade. There are no known workarounds for this vulnerability.

Exploitation Scenario

An adversary with low-privilege API access to a shared vLLM serving endpoint (Python 3.12+) identifies commonly-used system prompts through prior knowledge or observation. They craft a prompt engineered to produce a hash collision with a target prompt's prefix—made more tractable on Python 3.12 due to hash(None) predictability. By submitting the crafted prompt first to warm the KV cache, the attacker plants a poisoned context. When a legitimate user subsequently sends the target prompt, vLLM's prefix cache returns the attacker's stored context instead of computing a fresh prefix. The model then responds as if conditioned on the attacker's prompt, potentially producing misleading, policy-violating, or manipulated outputs—without any visible signal to the end user that cache reuse occurred.

CVSS Vector

CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:N/I:L/A:N

Timeline

Published
February 7, 2025
Last Modified
July 2, 2025
First Seen
February 7, 2025

Related Vulnerabilities