CVE-2025-25183 — LOW (CVSS 2.6) AI Security Vulnerability

CISO Take

vLLM deployments on Python 3.12+ using prefix caching are vulnerable to cache poisoning via crafted hash collisions, causing users to receive responses generated from attacker-controlled prompt contexts. The attack requires low-privilege API access and knowledge of target prompts, limiting real-world risk—but multi-tenant shared inference infrastructure faces meaningful integrity exposure. Upgrade all vLLM instances to 0.7.2 immediately; if patching is delayed, disable prefix caching as a stopgap.

Risk Assessment

Low operational risk (CVSS 2.6, EPSS 0.36%) but notable as a concrete AI inference integrity attack vector. High attack complexity and required user interaction significantly constrain exploitability. Python 3.12's predictable hash(None) behavior eliminates hash randomization for None values, increasing feasibility of deliberate collisions compared to earlier Python versions. Risk escalates in multi-tenant vLLM deployments where multiple organizations or users share inference infrastructure—here a cache integrity failure carries downstream compliance and trust implications beyond its raw CVSS score.

Affected Systems

Package	Ecosystem	Vulnerable Range	Patched
vllm	pip	—	No patch
79.5K 127 dependents Pushed today 56% patched ~32d to patch Full package profile →
vllm	pip	< 0.7.2	`0.7.2`
79.5K 127 dependents Pushed today 56% patched ~32d to patch Full package profile →

Severity & Risk

CVSS 3.1

2.6 / 10

EPSS

0.3%

chance of exploitation in 30 days

Higher than 55% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

No known exploitation

Sophistication

Moderate

Attack Surface

AV Network

AC High

PR Low

UI Required

S Unchanged

C None

I Low

A None

Recommended Action

6 steps

Upgrade vLLM to >= 0.7.2 (patch available, breaking change is minimal).
If immediate patching is blocked, disable prefix caching via --disable-prefix-caching flag or equivalent config key—this eliminates the attack surface at the cost of cache performance.
Audit Python version across inference fleet; Python 3.12+ deployments carry elevated risk due to predictable hash(None).
For multi-tenant deployments, consider tenant-isolated vLLM instances until patch is applied.
Monitor inference logs for anomalous response inconsistencies or unexpected prompt-to-output mismatches that may indicate cache pollution.
Review API access logs for unusual high-volume or repetitive request patterns suggesting hash-collision probing.

CISA SSVC Assessment

Decision Track

Exploitation none

Automatable No

Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Adversarial Examples Data Leakage Inference Framework AML.T0006 - Active Scanning AML.T0031 - Erode AI Model Integrity AML.T0040 - AI Model Inference API Access AML.T0043.003 - Manual Modification AML.T0049 - Exploit Public-Facing Application

Compliance Impact

This CVE is relevant to:

EU AI Act

Art.15 - Accuracy, robustness and cybersecurity

ISO 42001

A.6.2.3 - AI system data quality

NIST AI RMF

MANAGE-2.2 - Mechanisms to sustain value and manage AI risks

OWASP LLM Top 10

LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2025-25183?

vLLM deployments on Python 3.12+ using prefix caching are vulnerable to cache poisoning via crafted hash collisions, causing users to receive responses generated from attacker-controlled prompt contexts. The attack requires low-privilege API access and knowledge of target prompts, limiting real-world risk—but multi-tenant shared inference infrastructure faces meaningful integrity exposure. Upgrade all vLLM instances to 0.7.2 immediately; if patching is delayed, disable prefix caching as a stopgap.

Is CVE-2025-25183 actively exploited?

No confirmed active exploitation of CVE-2025-25183 has been reported, but organizations should still patch proactively.

How to fix CVE-2025-25183?

1. Upgrade vLLM to >= 0.7.2 (patch available, breaking change is minimal). 2. If immediate patching is blocked, disable prefix caching via --disable-prefix-caching flag or equivalent config key—this eliminates the attack surface at the cost of cache performance. 3. Audit Python version across inference fleet; Python 3.12+ deployments carry elevated risk due to predictable hash(None). 4. For multi-tenant deployments, consider tenant-isolated vLLM instances until patch is applied. 5. Monitor inference logs for anomalous response inconsistencies or unexpected prompt-to-output mismatches that may indicate cache pollution. 6. Review API access logs for unusual high-volume or repetitive request patterns suggesting hash-collision probing.

What systems are affected by CVE-2025-25183?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving (vLLM), Multi-tenant model serving, KV cache / prefix caching systems, RAG pipelines with vLLM backend, AI agent frameworks using vLLM as inference engine.

What is the CVSS score for CVE-2025-25183?

CVE-2025-25183 has a CVSS v3.1 base score of 2.6 (LOW). The EPSS exploitation probability is 0.32%.

Technical Details

NVD Description

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Maliciously constructed statements can lead to hash collisions, resulting in cache reuse, which can interfere with subsequent responses and cause unintended behavior. Prefix caching makes use of Python's built-in hash() function. As of Python 3.12, the behavior of hash(None) has changed to be a predictable constant value. This makes it more feasible that someone could try exploit hash collisions. The impact of a collision would be using cache that was generated using different content. Given knowledge of prompts in use and predictable hashing behavior, someone could intentionally populate the cache using a prompt known to collide with another prompt in use. This issue has been addressed in version 0.7.2 and all users are advised to upgrade. There are no known workarounds for this vulnerability.

Exploitation Scenario

An adversary with low-privilege API access to a shared vLLM serving endpoint (Python 3.12+) identifies commonly-used system prompts through prior knowledge or observation. They craft a prompt engineered to produce a hash collision with a target prompt's prefix—made more tractable on Python 3.12 due to hash(None) predictability. By submitting the crafted prompt first to warm the KV cache, the attacker plants a poisoned context. When a legitimate user subsequently sends the target prompt, vLLM's prefix cache returns the attacker's stored context instead of computing a fresh prefix. The model then responds as if conditioned on the attacker's prompt, potentially producing misleading, policy-violating, or manipulated outputs—without any visible signal to the end user that cache reuse occurred.