CVE-2025-62164: vllm: Input Validation flaw enables exploitation

GHSA-mrw7-hf4f-83pf HIGH
Published November 21, 2025
CISO Take

Any vLLM deployment on versions 0.10.2–0.11.0 exposing the Completions API is at risk of RCE from any authenticated (low-privilege) user who can submit prompt embeddings. Patch to 0.11.1 immediately — this is not a theoretical risk, the exploit primitive (crafted PyTorch sparse tensor via torch.load) is well-documented. If you cannot patch now, block prompt embedding inputs at the API gateway and audit who holds API credentials.

Risk Assessment

High risk for organizations serving vLLM in multi-tenant or externally-accessible environments. CVSS 8.8 with network vector, low complexity, and low privilege requirement means any API user can attempt exploitation — no admin access needed. EPSS is currently low (0.00128), suggesting no active exploitation at time of publication, but the underlying technique (deserialization of untrusted data + PyTorch 2.8.0 silent removal of sparse tensor bounds checks) creates a weaponizable exploit primitive that lowers the barrier for threat actors familiar with PyTorch internals. Highest exposure: cloud-hosted LLM inference endpoints, shared inference infrastructure, and SaaS platforms built on vLLM.

Affected Systems

Package Ecosystem Vulnerable Range Patched
vllm pip No patch
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →
vllm pip >= 0.10.2, < 0.11.1 0.11.1
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →

Severity & Risk

CVSS 3.1
8.8 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 41% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Moderate

Attack Surface

AV AC PR UI S C I A
AV Network
AC Low
PR Low
UI None
S Unchanged
C High
I High
A High

Recommended Action

6 steps
  1. PATCH

    Upgrade vLLM to 0.11.1 immediately — this is the only definitive fix.

  2. WORKAROUND (if patch is blocked): Disable or strip prompt_embedding_pool / embeddings fields at the API gateway before requests reach vLLM; reject any Completions API calls containing raw tensor payloads.

  3. PYTORCH VERSION

    If upgrading vLLM is not yet feasible, evaluate pinning PyTorch below 2.8.0 as a temporary control — but verify this does not break other dependencies.

  4. ACCESS CONTROL

    Restrict Completions API access to trusted, authenticated users only; treat PR:L as a red flag in shared environments.

  5. DETECTION

    Alert on unusually large or binary-encoded payload bodies in Completions API requests; monitor for vLLM process crashes (SIGSEGV, SIGBUS) which may indicate exploitation attempts.

  6. NETWORK SEGMENTATION

    Ensure vLLM inference servers are not directly internet-facing; place behind an API gateway that can enforce input schema validation.

CISA SSVC Assessment

Decision Track
Exploitation none
Automatable No
Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity Article 9 - Risk management system
ISO 42001
A.6.1.5 - AI system security A.6.2.6 - Testing and validation of AI systems
NIST AI RMF
GOVERN 6.2 - Policies and procedures are in place for AI risk management MANAGE 2.2 - Mechanisms are in place to respond to and recover from risks
OWASP LLM Top 10
LLM03:2025 - Supply Chain Vulnerabilities LLM08:2025 - Vector and Embedding Weaknesses

Frequently Asked Questions

What is CVE-2025-62164?

Any vLLM deployment on versions 0.10.2–0.11.0 exposing the Completions API is at risk of RCE from any authenticated (low-privilege) user who can submit prompt embeddings. Patch to 0.11.1 immediately — this is not a theoretical risk, the exploit primitive (crafted PyTorch sparse tensor via torch.load) is well-documented. If you cannot patch now, block prompt embedding inputs at the API gateway and audit who holds API credentials.

Is CVE-2025-62164 actively exploited?

No confirmed active exploitation of CVE-2025-62164 has been reported, but organizations should still patch proactively.

How to fix CVE-2025-62164?

1. PATCH: Upgrade vLLM to 0.11.1 immediately — this is the only definitive fix. 2. WORKAROUND (if patch is blocked): Disable or strip prompt_embedding_pool / embeddings fields at the API gateway before requests reach vLLM; reject any Completions API calls containing raw tensor payloads. 3. PYTORCH VERSION: If upgrading vLLM is not yet feasible, evaluate pinning PyTorch below 2.8.0 as a temporary control — but verify this does not break other dependencies. 4. ACCESS CONTROL: Restrict Completions API access to trusted, authenticated users only; treat PR:L as a red flag in shared environments. 5. DETECTION: Alert on unusually large or binary-encoded payload bodies in Completions API requests; monitor for vLLM process crashes (SIGSEGV, SIGBUS) which may indicate exploitation attempts. 6. NETWORK SEGMENTATION: Ensure vLLM inference servers are not directly internet-facing; place behind an API gateway that can enforce input schema validation.

What systems are affected by CVE-2025-62164?

This vulnerability affects the following AI/ML architecture patterns: LLM inference API, model serving, RAG pipelines, shared inference infrastructure, LLM-as-a-service platforms.

What is the CVSS score for CVE-2025-62164?

CVE-2025-62164 has a CVSS v3.1 base score of 8.8 (HIGH). The EPSS exploitation probability is 0.19%.

Technical Details

NVD Description

vLLM is an inference and serving engine for large language models (LLMs). From versions 0.10.2 to before 0.11.1, a memory corruption vulnerability could lead to a crash (denial-of-service) and potentially remote code execution (RCE), exists in the Completions API endpoint. When processing user-supplied prompt embeddings, the endpoint loads serialized tensors using torch.load() without sufficient validation. Due to a change introduced in PyTorch 2.8.0, sparse tensor integrity checks are disabled by default. As a result, maliciously crafted tensors can bypass internal bounds checks and trigger an out-of-bounds memory write during the call to to_dense(). This memory corruption can crash vLLM and potentially lead to code execution on the server hosting vLLM. This issue has been patched in version 0.11.1.

Exploitation Scenario

An attacker obtains low-privilege API credentials to a vLLM-powered inference service (e.g., through credential stuffing, a leaked API key, or a free-tier account on a SaaS platform). They craft a malicious PyTorch sparse tensor with manipulated metadata — specifically, crafting sparse index/value buffers that violate internal bounds assumptions. With PyTorch 2.8.0+, the integrity checks that would catch this are disabled by default. The attacker serializes the tensor using Python's pickle format (torch.load's default), encodes it in a Completions API request as a prompt embedding, and submits it to vLLM's endpoint. When vLLM calls to_dense() on the tensor, the out-of-bounds write corrupts server memory. In a DoS scenario this crashes the process immediately. In an RCE scenario, a skilled attacker could use the write-what-where primitive (CWE-123) to overwrite function pointers or return addresses, achieving code execution on the inference host — potentially gaining access to model weights, training data, API keys stored in environment variables, and the broader server environment.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
November 21, 2025
Last Modified
December 4, 2025
First Seen
November 21, 2025

Related Vulnerabilities