CVE-2025-25183: vLLM: hash collision enables prefix cache poisoning

GHSA-rm76-4mrf-v9r8 LOW
Published February 7, 2025
CISO Take

vLLM deployments on Python 3.12+ using prefix caching are vulnerable to cache poisoning via crafted hash collisions, causing users to receive responses generated from attacker-controlled prompt contexts. The attack requires low-privilege API access and knowledge of target prompts, limiting real-world risk—but multi-tenant shared inference infrastructure faces meaningful integrity exposure. Upgrade all vLLM instances to 0.7.2 immediately; if patching is delayed, disable prefix caching as a stopgap.

What is the risk?

Low operational risk (CVSS 2.6, EPSS 0.36%) but notable as a concrete AI inference integrity attack vector. High attack complexity and required user interaction significantly constrain exploitability. Python 3.12's predictable hash(None) behavior eliminates hash randomization for None values, increasing feasibility of deliberate collisions compared to earlier Python versions. Risk escalates in multi-tenant vLLM deployments where multiple organizations or users share inference infrastructure—here a cache integrity failure carries downstream compliance and trust implications beyond its raw CVSS score.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
vLLM pip No patch
83.4K 130 dependents Pushed 3d ago 34% patched ~32d to patch Full package profile →
vLLM pip < 0.7.2 0.7.2
83.4K 130 dependents Pushed 3d ago 34% patched ~32d to patch Full package profile →

How severe is it?

CVSS 3.1
2.6 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 7% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Moderate

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC High
PR Low
UI Required
S Unchanged
C None
I Low
A None

What should I do?

6 steps
  1. Upgrade vLLM to >= 0.7.2 (patch available, breaking change is minimal).

  2. If immediate patching is blocked, disable prefix caching via --disable-prefix-caching flag or equivalent config key—this eliminates the attack surface at the cost of cache performance.

  3. Audit Python version across inference fleet; Python 3.12+ deployments carry elevated risk due to predictable hash(None).

  4. For multi-tenant deployments, consider tenant-isolated vLLM instances until patch is applied.

  5. Monitor inference logs for anomalous response inconsistencies or unexpected prompt-to-output mismatches that may indicate cache pollution.

  6. Review API access logs for unusual high-volume or repetitive request patterns suggesting hash-collision probing.

What does CISA's SSVC say?

Decision Track
Exploitation none
Automatable No
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Art.15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2.3 - AI system data quality
NIST AI RMF
MANAGE-2.2 - Mechanisms to sustain value and manage AI risks
OWASP LLM Top 10
LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2025-25183?

vLLM deployments on Python 3.12+ using prefix caching are vulnerable to cache poisoning via crafted hash collisions, causing users to receive responses generated from attacker-controlled prompt contexts. The attack requires low-privilege API access and knowledge of target prompts, limiting real-world risk—but multi-tenant shared inference infrastructure faces meaningful integrity exposure. Upgrade all vLLM instances to 0.7.2 immediately; if patching is delayed, disable prefix caching as a stopgap.

Is CVE-2025-25183 actively exploited?

No confirmed active exploitation of CVE-2025-25183 has been reported, but organizations should still patch proactively.

How to fix CVE-2025-25183?

1. Upgrade vLLM to >= 0.7.2 (patch available, breaking change is minimal). 2. If immediate patching is blocked, disable prefix caching via --disable-prefix-caching flag or equivalent config key—this eliminates the attack surface at the cost of cache performance. 3. Audit Python version across inference fleet; Python 3.12+ deployments carry elevated risk due to predictable hash(None). 4. For multi-tenant deployments, consider tenant-isolated vLLM instances until patch is applied. 5. Monitor inference logs for anomalous response inconsistencies or unexpected prompt-to-output mismatches that may indicate cache pollution. 6. Review API access logs for unusual high-volume or repetitive request patterns suggesting hash-collision probing.

What systems are affected by CVE-2025-25183?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving (vLLM), Multi-tenant model serving, KV cache / prefix caching systems, RAG pipelines with vLLM backend, AI agent frameworks using vLLM as inference engine.

What is the CVSS score for CVE-2025-25183?

CVE-2025-25183 has a CVSS v3.1 base score of 2.6 (LOW). The EPSS exploitation probability is 0.18%.

What is the AI security impact?

Affected AI Architectures

LLM inference serving (vLLM)Multi-tenant model servingKV cache / prefix caching systemsRAG pipelines with vLLM backendAI agent frameworks using vLLM as inference engine

MITRE ATLAS Techniques

AML.T0006 Active Scanning
AML.T0031 Erode AI Model Integrity
AML.T0040 AI Model Inference API Access
AML.T0043.003 Manual Modification
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art.15
ISO 42001: A.6.2.3
NIST AI RMF: MANAGE-2.2
OWASP LLM Top 10: LLM04

What are the technical details?

Original Advisory

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Maliciously constructed statements can lead to hash collisions, resulting in cache reuse, which can interfere with subsequent responses and cause unintended behavior. Prefix caching makes use of Python's built-in hash() function. As of Python 3.12, the behavior of hash(None) has changed to be a predictable constant value. This makes it more feasible that someone could try exploit hash collisions. The impact of a collision would be using cache that was generated using different content. Given knowledge of prompts in use and predictable hashing behavior, someone could intentionally populate the cache using a prompt known to collide with another prompt in use. This issue has been addressed in version 0.7.2 and all users are advised to upgrade. There are no known workarounds for this vulnerability.

Exploitation Scenario

An adversary with low-privilege API access to a shared vLLM serving endpoint (Python 3.12+) identifies commonly-used system prompts through prior knowledge or observation. They craft a prompt engineered to produce a hash collision with a target prompt's prefix—made more tractable on Python 3.12 due to hash(None) predictability. By submitting the crafted prompt first to warm the KV cache, the attacker plants a poisoned context. When a legitimate user subsequently sends the target prompt, vLLM's prefix cache returns the attacker's stored context instead of computing a fresh prefix. The model then responds as if conditioned on the attacker's prompt, potentially producing misleading, policy-violating, or manipulated outputs—without any visible signal to the end user that cache reuse occurred.

Weaknesses (CWE)

CWE-354 — Improper Validation of Integrity Check Value: The product does not validate or incorrectly validates the integrity check values or "checksums" of a message. This may prevent it from detecting if the data has been modified or corrupted in transmission.

  • [Implementation] Ensure that the checksums present in messages are properly checked in accordance with the protocol specification before they are parsed and used.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:N/I:L/A:N

Timeline

Published
February 7, 2025
Last Modified
July 2, 2025
First Seen
February 7, 2025

Related Vulnerabilities