Any vLLM deployment on versions 0.10.2–0.11.0 exposing the Completions API is at risk of RCE from any authenticated (low-privilege) user who can submit prompt embeddings. Patch to 0.11.1 immediately — this is not a theoretical risk, the exploit primitive (crafted PyTorch sparse tensor via torch.load) is well-documented. If you cannot patch now, block prompt embedding inputs at the API gateway and audit who holds API credentials.
Affected Systems
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| vllm | pip | >= 0.10.2, < 0.11.1 | 0.11.1 |
| vllm | pip | — | No patch |
| vllm | pip | — | No patch |
| vllm | pip | — | No patch |
Severity & Risk
Recommended Action
- 1. PATCH: Upgrade vLLM to 0.11.1 immediately — this is the only definitive fix. 2. WORKAROUND (if patch is blocked): Disable or strip prompt_embedding_pool / embeddings fields at the API gateway before requests reach vLLM; reject any Completions API calls containing raw tensor payloads. 3. PYTORCH VERSION: If upgrading vLLM is not yet feasible, evaluate pinning PyTorch below 2.8.0 as a temporary control — but verify this does not break other dependencies. 4. ACCESS CONTROL: Restrict Completions API access to trusted, authenticated users only; treat PR:L as a red flag in shared environments. 5. DETECTION: Alert on unusually large or binary-encoded payload bodies in Completions API requests; monitor for vLLM process crashes (SIGSEGV, SIGBUS) which may indicate exploitation attempts. 6. NETWORK SEGMENTATION: Ensure vLLM inference servers are not directly internet-facing; place behind an API gateway that can enforce input schema validation.
Classification
Compliance Impact
This CVE is relevant to:
Technical Details
NVD Description
vLLM is an inference and serving engine for large language models (LLMs). From versions 0.10.2 to before 0.11.1, a memory corruption vulnerability could lead to a crash (denial-of-service) and potentially remote code execution (RCE), exists in the Completions API endpoint. When processing user-supplied prompt embeddings, the endpoint loads serialized tensors using torch.load() without sufficient validation. Due to a change introduced in PyTorch 2.8.0, sparse tensor integrity checks are disabled by default. As a result, maliciously crafted tensors can bypass internal bounds checks and trigger an out-of-bounds memory write during the call to to_dense(). This memory corruption can crash vLLM and potentially lead to code execution on the server hosting vLLM. This issue has been patched in version 0.11.1.
Exploitation Scenario
An attacker obtains low-privilege API credentials to a vLLM-powered inference service (e.g., through credential stuffing, a leaked API key, or a free-tier account on a SaaS platform). They craft a malicious PyTorch sparse tensor with manipulated metadata — specifically, crafting sparse index/value buffers that violate internal bounds assumptions. With PyTorch 2.8.0+, the integrity checks that would catch this are disabled by default. The attacker serializes the tensor using Python's pickle format (torch.load's default), encodes it in a Completions API request as a prompt embedding, and submits it to vLLM's endpoint. When vLLM calls to_dense() on the tensor, the out-of-bounds write corrupts server memory. In a DoS scenario this crashes the process immediately. In an RCE scenario, a skilled attacker could use the write-what-where primitive (CWE-123) to overwrite function pointers or return addresses, achieving code execution on the inference host — potentially gaining access to model weights, training data, API keys stored in environment variables, and the broader server environment.
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H References
- github.com/advisories/GHSA-mrw7-hf4f-83pf
- github.com/vllm-project/vllm/commit/58fab50d82838d5014f4a14d991fdb9352c9c84b
- github.com/vllm-project/vllm/pull/27204
- github.com/vllm-project/vllm/security/advisories/GHSA-mrw7-hf4f-83pf
- nvd.nist.gov/vuln/detail/CVE-2025-62164
- github.com/vllm-project/vllm/commit/58fab50d82838d5014f4a14d991fdb9352c9c84b Patch
- github.com/vllm-project/vllm/pull/27204 Issue Patch Vendor
- github.com/vllm-project/vllm/security/advisories/GHSA-mrw7-hf4f-83pf Issue Vendor