vLLM deployments on Python 3.12+ using prefix caching are vulnerable to cache poisoning via crafted hash collisions, causing users to receive responses generated from attacker-controlled prompt contexts. The attack requires low-privilege API access and knowledge of target prompts, limiting real-world risk—but multi-tenant shared inference infrastructure faces meaningful integrity exposure. Upgrade all vLLM instances to 0.7.2 immediately; if patching is delayed, disable prefix caching as a stopgap.
Risk Assessment
Low operational risk (CVSS 2.6, EPSS 0.36%) but notable as a concrete AI inference integrity attack vector. High attack complexity and required user interaction significantly constrain exploitability. Python 3.12's predictable hash(None) behavior eliminates hash randomization for None values, increasing feasibility of deliberate collisions compared to earlier Python versions. Risk escalates in multi-tenant vLLM deployments where multiple organizations or users share inference infrastructure—here a cache integrity failure carries downstream compliance and trust implications beyond its raw CVSS score.
Affected Systems
Severity & Risk
Attack Surface
Recommended Action
6 steps-
Upgrade vLLM to >= 0.7.2 (patch available, breaking change is minimal).
-
If immediate patching is blocked, disable prefix caching via --disable-prefix-caching flag or equivalent config key—this eliminates the attack surface at the cost of cache performance.
-
Audit Python version across inference fleet; Python 3.12+ deployments carry elevated risk due to predictable hash(None).
-
For multi-tenant deployments, consider tenant-isolated vLLM instances until patch is applied.
-
Monitor inference logs for anomalous response inconsistencies or unexpected prompt-to-output mismatches that may indicate cache pollution.
-
Review API access logs for unusual high-volume or repetitive request patterns suggesting hash-collision probing.
CISA SSVC Assessment
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2025-25183?
vLLM deployments on Python 3.12+ using prefix caching are vulnerable to cache poisoning via crafted hash collisions, causing users to receive responses generated from attacker-controlled prompt contexts. The attack requires low-privilege API access and knowledge of target prompts, limiting real-world risk—but multi-tenant shared inference infrastructure faces meaningful integrity exposure. Upgrade all vLLM instances to 0.7.2 immediately; if patching is delayed, disable prefix caching as a stopgap.
Is CVE-2025-25183 actively exploited?
No confirmed active exploitation of CVE-2025-25183 has been reported, but organizations should still patch proactively.
How to fix CVE-2025-25183?
1. Upgrade vLLM to >= 0.7.2 (patch available, breaking change is minimal). 2. If immediate patching is blocked, disable prefix caching via --disable-prefix-caching flag or equivalent config key—this eliminates the attack surface at the cost of cache performance. 3. Audit Python version across inference fleet; Python 3.12+ deployments carry elevated risk due to predictable hash(None). 4. For multi-tenant deployments, consider tenant-isolated vLLM instances until patch is applied. 5. Monitor inference logs for anomalous response inconsistencies or unexpected prompt-to-output mismatches that may indicate cache pollution. 6. Review API access logs for unusual high-volume or repetitive request patterns suggesting hash-collision probing.
What systems are affected by CVE-2025-25183?
This vulnerability affects the following AI/ML architecture patterns: LLM inference serving (vLLM), Multi-tenant model serving, KV cache / prefix caching systems, RAG pipelines with vLLM backend, AI agent frameworks using vLLM as inference engine.
What is the CVSS score for CVE-2025-25183?
CVE-2025-25183 has a CVSS v3.1 base score of 2.6 (LOW). The EPSS exploitation probability is 0.32%.
Technical Details
NVD Description
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Maliciously constructed statements can lead to hash collisions, resulting in cache reuse, which can interfere with subsequent responses and cause unintended behavior. Prefix caching makes use of Python's built-in hash() function. As of Python 3.12, the behavior of hash(None) has changed to be a predictable constant value. This makes it more feasible that someone could try exploit hash collisions. The impact of a collision would be using cache that was generated using different content. Given knowledge of prompts in use and predictable hashing behavior, someone could intentionally populate the cache using a prompt known to collide with another prompt in use. This issue has been addressed in version 0.7.2 and all users are advised to upgrade. There are no known workarounds for this vulnerability.
Exploitation Scenario
An adversary with low-privilege API access to a shared vLLM serving endpoint (Python 3.12+) identifies commonly-used system prompts through prior knowledge or observation. They craft a prompt engineered to produce a hash collision with a target prompt's prefix—made more tractable on Python 3.12 due to hash(None) predictability. By submitting the crafted prompt first to warm the KV cache, the attacker plants a poisoned context. When a legitimate user subsequently sends the target prompt, vLLM's prefix cache returns the attacker's stored context instead of computing a fresh prefix. The model then responds as if conditioned on the attacker's prompt, potentially producing misleading, policy-violating, or manipulated outputs—without any visible signal to the end user that cache reuse occurred.
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:N/I:L/A:N References
- github.com/advisories/GHSA-rm76-4mrf-v9r8
- github.com/pypa/advisory-database/tree/main/vulns/vllm/PYSEC-2025-62.yaml
- github.com/python/cpython/pull/99541
- github.com/vllm-project/vllm/commit/73b35cca7f3745d07d439c197768b25d88b6ab7f
- nvd.nist.gov/vuln/detail/CVE-2025-25183
- github.com/python/cpython/commit/432117cd1f59c76d97da2eaff55a7d758301dbc7 Not Applicable
- github.com/vllm-project/vllm/pull/12621 Issue
- github.com/vllm-project/vllm/security/advisories/GHSA-rm76-4mrf-v9r8 Vendor
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm
AI Threat Alert