Any vLLM deployment on versions 0.10.2–0.11.0 exposing the Completions API is at risk of RCE from any authenticated (low-privilege) user who can submit prompt embeddings. Patch to 0.11.1 immediately — this is not a theoretical risk, the exploit primitive (crafted PyTorch sparse tensor via torch.load) is well-documented. If you cannot patch now, block prompt embedding inputs at the API gateway and audit who holds API credentials.
Risk Assessment
High risk for organizations serving vLLM in multi-tenant or externally-accessible environments. CVSS 8.8 with network vector, low complexity, and low privilege requirement means any API user can attempt exploitation — no admin access needed. EPSS is currently low (0.00128), suggesting no active exploitation at time of publication, but the underlying technique (deserialization of untrusted data + PyTorch 2.8.0 silent removal of sparse tensor bounds checks) creates a weaponizable exploit primitive that lowers the barrier for threat actors familiar with PyTorch internals. Highest exposure: cloud-hosted LLM inference endpoints, shared inference infrastructure, and SaaS platforms built on vLLM.
Affected Systems
Severity & Risk
Attack Surface
Recommended Action
6 steps-
PATCH
Upgrade vLLM to 0.11.1 immediately — this is the only definitive fix.
-
WORKAROUND (if patch is blocked): Disable or strip prompt_embedding_pool / embeddings fields at the API gateway before requests reach vLLM; reject any Completions API calls containing raw tensor payloads.
-
PYTORCH VERSION
If upgrading vLLM is not yet feasible, evaluate pinning PyTorch below 2.8.0 as a temporary control — but verify this does not break other dependencies.
-
ACCESS CONTROL
Restrict Completions API access to trusted, authenticated users only; treat PR:L as a red flag in shared environments.
-
DETECTION
Alert on unusually large or binary-encoded payload bodies in Completions API requests; monitor for vLLM process crashes (SIGSEGV, SIGBUS) which may indicate exploitation attempts.
-
NETWORK SEGMENTATION
Ensure vLLM inference servers are not directly internet-facing; place behind an API gateway that can enforce input schema validation.
CISA SSVC Assessment
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2025-62164?
Any vLLM deployment on versions 0.10.2–0.11.0 exposing the Completions API is at risk of RCE from any authenticated (low-privilege) user who can submit prompt embeddings. Patch to 0.11.1 immediately — this is not a theoretical risk, the exploit primitive (crafted PyTorch sparse tensor via torch.load) is well-documented. If you cannot patch now, block prompt embedding inputs at the API gateway and audit who holds API credentials.
Is CVE-2025-62164 actively exploited?
No confirmed active exploitation of CVE-2025-62164 has been reported, but organizations should still patch proactively.
How to fix CVE-2025-62164?
1. PATCH: Upgrade vLLM to 0.11.1 immediately — this is the only definitive fix. 2. WORKAROUND (if patch is blocked): Disable or strip prompt_embedding_pool / embeddings fields at the API gateway before requests reach vLLM; reject any Completions API calls containing raw tensor payloads. 3. PYTORCH VERSION: If upgrading vLLM is not yet feasible, evaluate pinning PyTorch below 2.8.0 as a temporary control — but verify this does not break other dependencies. 4. ACCESS CONTROL: Restrict Completions API access to trusted, authenticated users only; treat PR:L as a red flag in shared environments. 5. DETECTION: Alert on unusually large or binary-encoded payload bodies in Completions API requests; monitor for vLLM process crashes (SIGSEGV, SIGBUS) which may indicate exploitation attempts. 6. NETWORK SEGMENTATION: Ensure vLLM inference servers are not directly internet-facing; place behind an API gateway that can enforce input schema validation.
What systems are affected by CVE-2025-62164?
This vulnerability affects the following AI/ML architecture patterns: LLM inference API, model serving, RAG pipelines, shared inference infrastructure, LLM-as-a-service platforms.
What is the CVSS score for CVE-2025-62164?
CVE-2025-62164 has a CVSS v3.1 base score of 8.8 (HIGH). The EPSS exploitation probability is 0.19%.
Technical Details
NVD Description
vLLM is an inference and serving engine for large language models (LLMs). From versions 0.10.2 to before 0.11.1, a memory corruption vulnerability could lead to a crash (denial-of-service) and potentially remote code execution (RCE), exists in the Completions API endpoint. When processing user-supplied prompt embeddings, the endpoint loads serialized tensors using torch.load() without sufficient validation. Due to a change introduced in PyTorch 2.8.0, sparse tensor integrity checks are disabled by default. As a result, maliciously crafted tensors can bypass internal bounds checks and trigger an out-of-bounds memory write during the call to to_dense(). This memory corruption can crash vLLM and potentially lead to code execution on the server hosting vLLM. This issue has been patched in version 0.11.1.
Exploitation Scenario
An attacker obtains low-privilege API credentials to a vLLM-powered inference service (e.g., through credential stuffing, a leaked API key, or a free-tier account on a SaaS platform). They craft a malicious PyTorch sparse tensor with manipulated metadata — specifically, crafting sparse index/value buffers that violate internal bounds assumptions. With PyTorch 2.8.0+, the integrity checks that would catch this are disabled by default. The attacker serializes the tensor using Python's pickle format (torch.load's default), encodes it in a Completions API request as a prompt embedding, and submits it to vLLM's endpoint. When vLLM calls to_dense() on the tensor, the out-of-bounds write corrupts server memory. In a DoS scenario this crashes the process immediately. In an RCE scenario, a skilled attacker could use the write-what-where primitive (CWE-123) to overwrite function pointers or return addresses, achieving code execution on the inference host — potentially gaining access to model weights, training data, API keys stored in environment variables, and the broader server environment.
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H References
- github.com/advisories/GHSA-mrw7-hf4f-83pf
- nvd.nist.gov/vuln/detail/CVE-2025-62164
- github.com/vllm-project/vllm/commit/58fab50d82838d5014f4a14d991fdb9352c9c84b Patch
- github.com/vllm-project/vllm/pull/27204 Issue Patch Vendor
- github.com/vllm-project/vllm/security/advisories/GHSA-mrw7-hf4f-83pf Issue Vendor
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm
AI Threat Alert