vLLM deployments on Python 3.12+ using prefix caching are vulnerable to cache poisoning via crafted hash collisions, causing users to receive responses generated from attacker-controlled prompt contexts. The attack requires low-privilege API access and knowledge of target prompts, limiting real-world risk—but multi-tenant shared inference infrastructure faces meaningful integrity exposure. Upgrade all vLLM instances to 0.7.2 immediately; if patching is delayed, disable prefix caching as a stopgap.
What is the risk?
Low operational risk (CVSS 2.6, EPSS 0.36%) but notable as a concrete AI inference integrity attack vector. High attack complexity and required user interaction significantly constrain exploitability. Python 3.12's predictable hash(None) behavior eliminates hash randomization for None values, increasing feasibility of deliberate collisions compared to earlier Python versions. Risk escalates in multi-tenant vLLM deployments where multiple organizations or users share inference infrastructure—here a cache integrity failure carries downstream compliance and trust implications beyond its raw CVSS score.
What systems are affected?
How severe is it?
What is the attack surface?
What should I do?
6 steps-
Upgrade vLLM to >= 0.7.2 (patch available, breaking change is minimal).
-
If immediate patching is blocked, disable prefix caching via --disable-prefix-caching flag or equivalent config key—this eliminates the attack surface at the cost of cache performance.
-
Audit Python version across inference fleet; Python 3.12+ deployments carry elevated risk due to predictable hash(None).
-
For multi-tenant deployments, consider tenant-isolated vLLM instances until patch is applied.
-
Monitor inference logs for anomalous response inconsistencies or unexpected prompt-to-output mismatches that may indicate cache pollution.
-
Review API access logs for unusual high-volume or repetitive request patterns suggesting hash-collision probing.
What does CISA's SSVC say?
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2025-25183?
vLLM deployments on Python 3.12+ using prefix caching are vulnerable to cache poisoning via crafted hash collisions, causing users to receive responses generated from attacker-controlled prompt contexts. The attack requires low-privilege API access and knowledge of target prompts, limiting real-world risk—but multi-tenant shared inference infrastructure faces meaningful integrity exposure. Upgrade all vLLM instances to 0.7.2 immediately; if patching is delayed, disable prefix caching as a stopgap.
Is CVE-2025-25183 actively exploited?
No confirmed active exploitation of CVE-2025-25183 has been reported, but organizations should still patch proactively.
How to fix CVE-2025-25183?
1. Upgrade vLLM to >= 0.7.2 (patch available, breaking change is minimal). 2. If immediate patching is blocked, disable prefix caching via --disable-prefix-caching flag or equivalent config key—this eliminates the attack surface at the cost of cache performance. 3. Audit Python version across inference fleet; Python 3.12+ deployments carry elevated risk due to predictable hash(None). 4. For multi-tenant deployments, consider tenant-isolated vLLM instances until patch is applied. 5. Monitor inference logs for anomalous response inconsistencies or unexpected prompt-to-output mismatches that may indicate cache pollution. 6. Review API access logs for unusual high-volume or repetitive request patterns suggesting hash-collision probing.
What systems are affected by CVE-2025-25183?
This vulnerability affects the following AI/ML architecture patterns: LLM inference serving (vLLM), Multi-tenant model serving, KV cache / prefix caching systems, RAG pipelines with vLLM backend, AI agent frameworks using vLLM as inference engine.
What is the CVSS score for CVE-2025-25183?
CVE-2025-25183 has a CVSS v3.1 base score of 2.6 (LOW). The EPSS exploitation probability is 0.18%.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0006 Active Scanning AML.T0031 Erode AI Model Integrity AML.T0040 AI Model Inference API Access AML.T0043.003 Manual Modification AML.T0049 Exploit Public-Facing Application Compliance Controls Affected
What are the technical details?
Original Advisory
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Maliciously constructed statements can lead to hash collisions, resulting in cache reuse, which can interfere with subsequent responses and cause unintended behavior. Prefix caching makes use of Python's built-in hash() function. As of Python 3.12, the behavior of hash(None) has changed to be a predictable constant value. This makes it more feasible that someone could try exploit hash collisions. The impact of a collision would be using cache that was generated using different content. Given knowledge of prompts in use and predictable hashing behavior, someone could intentionally populate the cache using a prompt known to collide with another prompt in use. This issue has been addressed in version 0.7.2 and all users are advised to upgrade. There are no known workarounds for this vulnerability.
Exploitation Scenario
An adversary with low-privilege API access to a shared vLLM serving endpoint (Python 3.12+) identifies commonly-used system prompts through prior knowledge or observation. They craft a prompt engineered to produce a hash collision with a target prompt's prefix—made more tractable on Python 3.12 due to hash(None) predictability. By submitting the crafted prompt first to warm the KV cache, the attacker plants a poisoned context. When a legitimate user subsequently sends the target prompt, vLLM's prefix cache returns the attacker's stored context instead of computing a fresh prefix. The model then responds as if conditioned on the attacker's prompt, potentially producing misleading, policy-violating, or manipulated outputs—without any visible signal to the end user that cache reuse occurred.
Weaknesses (CWE)
CWE-354 Improper Validation of Integrity Check Value
Primary
CWE-354 Improper Validation of Integrity Check Value CWE-354 — Improper Validation of Integrity Check Value: The product does not validate or incorrectly validates the integrity check values or "checksums" of a message. This may prevent it from detecting if the data has been modified or corrupted in transmission.
- [Implementation] Ensure that the checksums present in messages are properly checked in accordance with the protocol specification before they are parsed and used.
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:N/I:L/A:N References
- github.com/advisories/GHSA-rm76-4mrf-v9r8
- github.com/pypa/advisory-database/tree/main/vulns/vllm/PYSEC-2025-62.yaml
- github.com/python/cpython/pull/99541
- github.com/vllm-project/vllm/commit/73b35cca7f3745d07d439c197768b25d88b6ab7f
- nvd.nist.gov/vuln/detail/CVE-2025-25183
- github.com/python/cpython/commit/432117cd1f59c76d97da2eaff55a7d758301dbc7 Not Applicable
- github.com/vllm-project/vllm/pull/12621 Issue
- github.com/vllm-project/vllm/security/advisories/GHSA-rm76-4mrf-v9r8 Vendor
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm