CVE-2025-62164 — HIGH (CVSS 8.8) AI Security Vulnerability

Q: Is CVE-2025-62164 actively exploited?

No confirmed active exploitation of CVE-2025-62164 has been reported, but organizations should still patch proactively.

Q: How to fix CVE-2025-62164?

1. PATCH: Upgrade vLLM to 0.11.1 immediately — this is the only definitive fix. 2. WORKAROUND (if patch is blocked): Disable or strip prompt_embedding_pool / embeddings fields at the API gateway before requests reach vLLM; reject any Completions API calls containing raw tensor payloads. 3. PYTORCH VERSION: If upgrading vLLM is not yet feasible, evaluate pinning PyTorch below 2.8.0 as a temporary control — but verify this does not break other dependencies. 4. ACCESS CONTROL: Restrict Completions API access to trusted, authenticated users only; treat PR:L as a red flag in shared environments. 5. DETECTION: Alert on unusually large or binary-encoded payload bodies in Completions API requests; monitor for vLLM process crashes (SIGSEGV, SIGBUS) which may indicate exploitation attempts. 6. NETWORK SEGMENTATION: Ensure vLLM inference servers are not directly internet-facing; place behind an API gateway that can enforce input schema validation.

Q: What systems are affected by CVE-2025-62164?

This vulnerability affects the following AI/ML architecture patterns: LLM inference API, model serving, RAG pipelines, shared inference infrastructure, LLM-as-a-service platforms.

Q: What is the CVSS score for CVE-2025-62164?

CVE-2025-62164 has a CVSS v3.1 base score of 8.8 (HIGH). The EPSS exploitation probability is 0.19%.

CISO Take

Any vLLM deployment on versions 0.10.2–0.11.0 exposing the Completions API is at risk of RCE from any authenticated (low-privilege) user who can submit prompt embeddings. Patch to 0.11.1 immediately — this is not a theoretical risk, the exploit primitive (crafted PyTorch sparse tensor via torch.load) is well-documented. If you cannot patch now, block prompt embedding inputs at the API gateway and audit who holds API credentials.

Risk Assessment

High risk for organizations serving vLLM in multi-tenant or externally-accessible environments. CVSS 8.8 with network vector, low complexity, and low privilege requirement means any API user can attempt exploitation — no admin access needed. EPSS is currently low (0.00128), suggesting no active exploitation at time of publication, but the underlying technique (deserialization of untrusted data + PyTorch 2.8.0 silent removal of sparse tensor bounds checks) creates a weaponizable exploit primitive that lowers the barrier for threat actors familiar with PyTorch internals. Highest exposure: cloud-hosted LLM inference endpoints, shared inference infrastructure, and SaaS platforms built on vLLM.

Affected Systems

Package	Ecosystem	Vulnerable Range	Patched
vllm	pip	—	No patch
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →
vllm	pip	>= 0.10.2, < 0.11.1	`0.11.1`
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →

Severity & Risk

CVSS 3.1

8.8 / 10

EPSS

0.2%

chance of exploitation in 30 days

Higher than 41% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

No known exploitation

Sophistication

Moderate

Attack Surface

AV Network

AC Low

PR Low

UI None

S Unchanged

C High

I High

A High

Recommended Action

6 steps

PATCH

Upgrade vLLM to 0.11.1 immediately — this is the only definitive fix.
WORKAROUND (if patch is blocked): Disable or strip prompt_embedding_pool / embeddings fields at the API gateway before requests reach vLLM; reject any Completions API calls containing raw tensor payloads.
PYTORCH VERSION

If upgrading vLLM is not yet feasible, evaluate pinning PyTorch below 2.8.0 as a temporary control — but verify this does not break other dependencies.
ACCESS CONTROL

Restrict Completions API access to trusted, authenticated users only; treat PR:L as a red flag in shared environments.
DETECTION

Alert on unusually large or binary-encoded payload bodies in Completions API requests; monitor for vLLM process crashes (SIGSEGV, SIGBUS) which may indicate exploitation attempts.
NETWORK SEGMENTATION

Ensure vLLM inference servers are not directly internet-facing; place behind an API gateway that can enforce input schema validation.

CISA SSVC Assessment

Decision Track

Exploitation none

Automatable No

Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Code Execution DoS Supply Chain Inference API Framework AML.T0010.001 - AI Software AML.T0029 - Denial of AI Service AML.T0040 - AI Model Inference API Access AML.T0043 - Craft Adversarial Data AML.T0049 - Exploit Public-Facing Application

Compliance Impact

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity Article 9 - Risk management system

ISO 42001

A.6.1.5 - AI system security A.6.2.6 - Testing and validation of AI systems

NIST AI RMF

GOVERN 6.2 - Policies and procedures are in place for AI risk management MANAGE 2.2 - Mechanisms are in place to respond to and recover from risks

OWASP LLM Top 10

LLM03:2025 - Supply Chain Vulnerabilities LLM08:2025 - Vector and Embedding Weaknesses

Frequently Asked Questions

What is CVE-2025-62164?

Any vLLM deployment on versions 0.10.2–0.11.0 exposing the Completions API is at risk of RCE from any authenticated (low-privilege) user who can submit prompt embeddings. Patch to 0.11.1 immediately — this is not a theoretical risk, the exploit primitive (crafted PyTorch sparse tensor via torch.load) is well-documented. If you cannot patch now, block prompt embedding inputs at the API gateway and audit who holds API credentials.

Is CVE-2025-62164 actively exploited?

No confirmed active exploitation of CVE-2025-62164 has been reported, but organizations should still patch proactively.

How to fix CVE-2025-62164?

1. PATCH: Upgrade vLLM to 0.11.1 immediately — this is the only definitive fix. 2. WORKAROUND (if patch is blocked): Disable or strip prompt_embedding_pool / embeddings fields at the API gateway before requests reach vLLM; reject any Completions API calls containing raw tensor payloads. 3. PYTORCH VERSION: If upgrading vLLM is not yet feasible, evaluate pinning PyTorch below 2.8.0 as a temporary control — but verify this does not break other dependencies. 4. ACCESS CONTROL: Restrict Completions API access to trusted, authenticated users only; treat PR:L as a red flag in shared environments. 5. DETECTION: Alert on unusually large or binary-encoded payload bodies in Completions API requests; monitor for vLLM process crashes (SIGSEGV, SIGBUS) which may indicate exploitation attempts. 6. NETWORK SEGMENTATION: Ensure vLLM inference servers are not directly internet-facing; place behind an API gateway that can enforce input schema validation.

What systems are affected by CVE-2025-62164?

This vulnerability affects the following AI/ML architecture patterns: LLM inference API, model serving, RAG pipelines, shared inference infrastructure, LLM-as-a-service platforms.

What is the CVSS score for CVE-2025-62164?

CVE-2025-62164 has a CVSS v3.1 base score of 8.8 (HIGH). The EPSS exploitation probability is 0.19%.

Technical Details

NVD Description

vLLM is an inference and serving engine for large language models (LLMs). From versions 0.10.2 to before 0.11.1, a memory corruption vulnerability could lead to a crash (denial-of-service) and potentially remote code execution (RCE), exists in the Completions API endpoint. When processing user-supplied prompt embeddings, the endpoint loads serialized tensors using torch.load() without sufficient validation. Due to a change introduced in PyTorch 2.8.0, sparse tensor integrity checks are disabled by default. As a result, maliciously crafted tensors can bypass internal bounds checks and trigger an out-of-bounds memory write during the call to to_dense(). This memory corruption can crash vLLM and potentially lead to code execution on the server hosting vLLM. This issue has been patched in version 0.11.1.

Exploitation Scenario

An attacker obtains low-privilege API credentials to a vLLM-powered inference service (e.g., through credential stuffing, a leaked API key, or a free-tier account on a SaaS platform). They craft a malicious PyTorch sparse tensor with manipulated metadata — specifically, crafting sparse index/value buffers that violate internal bounds assumptions. With PyTorch 2.8.0+, the integrity checks that would catch this are disabled by default. The attacker serializes the tensor using Python's pickle format (torch.load's default), encodes it in a Completions API request as a prompt embedding, and submits it to vLLM's endpoint. When vLLM calls to_dense() on the tensor, the out-of-bounds write corrupts server memory. In a DoS scenario this crashes the process immediately. In an RCE scenario, a skilled attacker could use the write-what-where primitive (CWE-123) to overwrite function pointers or return addresses, achieving code execution on the inference host — potentially gaining access to model weights, training data, API keys stored in environment variables, and the broader server environment.