GHSA-mcmc-2m55-j8jj: vLLM Input Validation flaw

Q: Is GHSA-mcmc-2m55-j8jj actively exploited?

No confirmed active exploitation of GHSA-mcmc-2m55-j8jj has been reported, but organizations should still patch proactively.

Q: How to fix GHSA-mcmc-2m55-j8jj?

1. PATCH: Upgrade to vLLM 0.13.0, which includes the real fix (sparse tensor index validation via PR #30649). 2. IMMEDIATE WORKAROUND: Confirm `enable_prompt_embeds` flag is explicitly set to `False` in all vLLM serving configs — do not rely on the default. 3. NETWORK CONTROLS: Restrict vLLM inference API endpoints to trusted internal callers; avoid direct public exposure without an authenticated gateway. 4. DETECTION: Monitor for requests with unusually large or malformed embedding payloads; anomalous memory usage spikes or inference worker crashes should trigger alert review. 5. AUDIT: Identify all internal services and pipelines that call vLLM directly and assess whether prompt embedding is in use.

Q: What systems are affected by GHSA-mcmc-2m55-j8jj?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, model serving, agent frameworks, RAG pipelines, multi-tenant AI API gateways.

Q: What is the CVSS score for GHSA-mcmc-2m55-j8jj?

GHSA-mcmc-2m55-j8jj has a CVSS v3.1 base score of 8.8 (HIGH).

CISO Take

The original patch for CVE-2025-62164 in vLLM was a workaround, not a fix — it just disabled prompt embeddings by default, leaving any deployment that re-enables the feature fully exposed to DoS via malformed sparse tensors. Upgrade to vLLM 0.13.0 immediately, which introduces actual sparse tensor validation. If you cannot patch today, verify the prompt embeddings feature flag is explicitly disabled in all your inference deployments.

What is the risk?

HIGH. CVSS 8.8 with network-accessible vector, low privilege requirement, and no user interaction makes this trivially reachable by any authenticated API consumer. The incomplete prior fix creates a false sense of security — teams that applied the original patch may believe they are protected. The combination of CWE-787 (Out-of-bounds Write) and CWE-123 (Write-what-where) alongside CWE-20 elevates concern beyond pure DoS: memory corruption primitives in an inference server could be chained toward more severe impact. Exposure is broad given vLLM's dominance as an open-source inference backend.

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
vLLM	pip	>= 0.10.2, < 0.11.1	`0.13.0`
83.4K 130 dependents Pushed 3d ago 34% patched ~32d to patch Full package profile →

Do you use vLLM? You're affected.

How severe is it?

CVSS 3.1

8.8 / 10

EPSS

N/A

Exploitation Status

No known exploitation

Sophistication

Moderate

What is the attack surface?

AV Network

AC Low

PR Low

UI None

S Unchanged

C High

I High

A High

What should I do?

5 steps

PATCH

Upgrade to vLLM 0.13.0, which includes the real fix (sparse tensor index validation via PR #30649).
IMMEDIATE WORKAROUND

Confirm enable_prompt_embeds flag is explicitly set to False in all vLLM serving configs — do not rely on the default.
NETWORK CONTROLS

Restrict vLLM inference API endpoints to trusted internal callers; avoid direct public exposure without an authenticated gateway.
DETECTION

Monitor for requests with unusually large or malformed embedding payloads; anomalous memory usage spikes or inference worker crashes should trigger alert review.
AUDIT

Identify all internal services and pipelines that call vLLM directly and assess whether prompt embedding is in use.

How is it classified?

DoS Code Execution Inference Framework AML.T0029 - Denial of AI Service AML.T0034 - Cost Harvesting AML.T0040 - AI Model Inference API Access AML.T0043 - Craft Adversarial Data AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Art.15 - Accuracy, Robustness and Cybersecurity Article 15 - Accuracy, robustness, and cybersecurity Article 9 - Risk management system

ISO 42001

6.1.2 - AI risk assessment A.6.2.5 - AI system inputs A.8.3 - Risk Treatment for AI Systems A.9.5 - AI System Availability and Resilience

NIST AI RMF

GOVERN-1.7 - Processes for Identifying Risks in AI Supply Chain MANAGE-2.2 - Residual risk response MANAGE-2.4 - Residual Risks After Treatment MEASURE-2.5 - AI system robustness

OWASP LLM Top 10

LLM04:2025 - Model Denial of Service LLM08:2025 - Vector and Embedding Weaknesses LLM10:2025 - Unbounded Consumption

Frequently Asked Questions

What is GHSA-mcmc-2m55-j8jj?

The original patch for CVE-2025-62164 in vLLM was a workaround, not a fix — it just disabled prompt embeddings by default, leaving any deployment that re-enables the feature fully exposed to DoS via malformed sparse tensors. Upgrade to vLLM 0.13.0 immediately, which introduces actual sparse tensor validation. If you cannot patch today, verify the prompt embeddings feature flag is explicitly disabled in all your inference deployments.

Is GHSA-mcmc-2m55-j8jj actively exploited?

No confirmed active exploitation of GHSA-mcmc-2m55-j8jj has been reported, but organizations should still patch proactively.

How to fix GHSA-mcmc-2m55-j8jj?

1. PATCH: Upgrade to vLLM 0.13.0, which includes the real fix (sparse tensor index validation via PR #30649). 2. IMMEDIATE WORKAROUND: Confirm `enable_prompt_embeds` flag is explicitly set to `False` in all vLLM serving configs — do not rely on the default. 3. NETWORK CONTROLS: Restrict vLLM inference API endpoints to trusted internal callers; avoid direct public exposure without an authenticated gateway. 4. DETECTION: Monitor for requests with unusually large or malformed embedding payloads; anomalous memory usage spikes or inference worker crashes should trigger alert review. 5. AUDIT: Identify all internal services and pipelines that call vLLM directly and assess whether prompt embedding is in use.

What systems are affected by GHSA-mcmc-2m55-j8jj?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, model serving, agent frameworks, RAG pipelines, multi-tenant AI API gateways.

What is the CVSS score for GHSA-mcmc-2m55-j8jj?

GHSA-mcmc-2m55-j8jj has a CVSS v3.1 base score of 8.8 (HIGH).

What is the AI security impact?

Affected AI Architectures

LLM inference servingmodel servingagent frameworksRAG pipelinesmulti-tenant AI API gateways

MITRE ATLAS Techniques

AML.T0029 Denial of AI Service

AML.T0034 Cost Harvesting

AML.T0040 AI Model Inference API Access

AML.T0043 Craft Adversarial Data

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art.15, Article 15, Article 9

ISO 42001: 6.1.2, A.6.2.5, A.8.3, A.9.5

NIST AI RMF: GOVERN-1.7, MANAGE-2.2, MANAGE-2.4, MEASURE-2.5

OWASP LLM Top 10: LLM04:2025, LLM08:2025, LLM10:2025

What are the technical details?

Original Advisory

### Summary The fix [here](https://github.com/vllm-project/vllm/pull/27204) for CVE-2025-62164 is not sufficient. The fix only disables prompt embeds by default rather than addressing the root cause, so the DoS vulnerability remains when the feature is enabled. ### Details vLLM's pending change attempts to fix the root cause, which is the missing sparse tensor validation. PyTorch (~v2.0) disables sparse tensor validation (specifically, sparse tensor invariants checks) by default for performance reasons. vLLM is adding the sparse tensor validation to ensure indices are valid, non-negative, and within bounds. These checks help catch malformed tensors. ### PoC NA ### Impact Current fix only added a flag to disable/enable prompt embeds, so by default, prompt embeds feature is disabled in vLLM, which stops DoS attacks through the embeddings. However, It doesn’t address the problem when the flag is enabled and there is still potential for DoS attacks. ### Changes * https://github.com/vllm-project/vllm/pull/30649

Exploitation Scenario

An attacker with API credentials to a vLLM-backed inference endpoint — a compromised service account, a malicious insider, or a tenant in a multi-tenant deployment — constructs a request containing prompt embeddings with malformed PyTorch sparse tensors: indices that are negative, out-of-bounds, or violate sparse tensor invariants. Since PyTorch v2.0 disabled these invariant checks by default for performance, vLLM (without the real fix) passes the tensor to the compute path unchecked. This triggers an out-of-bounds memory write, causing the inference worker process to crash (DoS) or potentially corrupt adjacent memory. Because vLLM is typically deployed as a shared inference backend serving multiple users or services, a single malformed request takes down availability for all consumers.

Weaknesses (CWE)

CWE-123 Write-what-where Condition Primary CWE-20 Improper Input Validation Primary CWE-502 Deserialization of Untrusted Data Primary CWE-787 Out-of-bounds Write Primary

CWE-123 — Write-what-where Condition: Any condition where the attacker has the ability to write an arbitrary value to an arbitrary location, often as the result of a buffer overflow.

[Architecture and Design] Use a language that provides appropriate memory abstractions.
[Operation] Use OS-level preventative functionality integrated after the fact. Not a complete solution.

Source: MITRE CWE corpus.