CVE-2024-8939: ilab/vllm: best_of param causes inference API DoS

MEDIUM PoC AVAILABLE
Published September 17, 2024
CISO Take

An attacker with local API access can crash your vllm inference server by sending requests with an inflated best_of parameter, consuming all compute resources until the service becomes unresponsive. Patch ilab immediately if running model serve in any shared or multi-tenant environment. If patching is not immediate, cap best_of input at the API gateway and enforce per-request timeouts.

Risk Assessment

Despite a medium CVSS (6.2), the local attack vector (AV:L) narrows real-world exposure to contexts where the vllm API is reachable from localhost or an internal network. In containerized AI serving clusters, shared dev boxes, or misconfigured Kubernetes namespaces, that boundary is often non-existent. Attack complexity is low, no credentials are required, and the exploitation path is trivial — a single crafted request can trigger cascading resource exhaustion. Availability loss on inference infrastructure is high-impact in any production LLM serving stack.

Severity & Risk

CVSS 3.1
6.2 / 10
EPSS
0.0%
chance of exploitation in 30 days
Higher than 7% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Local
AC Low
PR None
UI None
S Unchanged
C None
I None
A High

Recommended Action

6 steps
  1. Apply the Red Hat patch referenced in the advisory (bugzilla.redhat.com/2312782).

  2. Enforce a hard cap on best_of at the API gateway or reverse proxy layer — reject any request with best_of > 5.

  3. Configure per-request inference timeouts at the vllm level to bound maximum resource consumption.

  4. Apply rate limiting per client IP or API token on all inference endpoints.

  5. Restrict network access to the vllm API to authorized internal clients only (firewall or network policy).

  6. Alert on sustained CPU/GPU utilization spikes above baseline on inference nodes as an early DoS detection signal.

CISA SSVC Assessment

Decision Track
Exploitation none
Automatable No
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Art. 9 - Risk Management System
ISO 42001
A.9.2 - AI System Availability and Resilience
NIST AI RMF
MANAGE-2.2 - Mechanisms for sustaining value of deployed AI systems
OWASP LLM Top 10
LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2024-8939?

An attacker with local API access can crash your vllm inference server by sending requests with an inflated best_of parameter, consuming all compute resources until the service becomes unresponsive. Patch ilab immediately if running model serve in any shared or multi-tenant environment. If patching is not immediate, cap best_of input at the API gateway and enforce per-request timeouts.

Is CVE-2024-8939 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2024-8939, increasing the risk of exploitation.

How to fix CVE-2024-8939?

1. Apply the Red Hat patch referenced in the advisory (bugzilla.redhat.com/2312782). 2. Enforce a hard cap on best_of at the API gateway or reverse proxy layer — reject any request with best_of > 5. 3. Configure per-request inference timeouts at the vllm level to bound maximum resource consumption. 4. Apply rate limiting per client IP or API token on all inference endpoints. 5. Restrict network access to the vllm API to authorized internal clients only (firewall or network policy). 6. Alert on sustained CPU/GPU utilization spikes above baseline on inference nodes as an early DoS detection signal.

What systems are affected by CVE-2024-8939?

This vulnerability affects the following AI/ML architecture patterns: model serving, LLM inference endpoints, RAG pipelines, agent frameworks.

What is the CVSS score for CVE-2024-8939?

CVE-2024-8939 has a CVSS v3.1 base score of 6.2 (MEDIUM). The EPSS exploitation probability is 0.03%.

Technical Details

NVD Description

A vulnerability was found in the ilab model serve component, where improper handling of the best_of parameter in the vllm JSON web API can lead to a Denial of Service (DoS). The API used for LLM-based sentence or chat completion accepts a best_of parameter to return the best completion from several options. When this parameter is set to a large value, the API does not handle timeouts or resource exhaustion properly, allowing an attacker to cause a DoS by consuming excessive system resources. This leads to the API becoming unresponsive, preventing legitimate users from accessing the service.

Exploitation Scenario

An attacker with access to the internal network or a compromised container sharing the namespace sends repeated POST requests to the vllm /v1/chat/completions endpoint with best_of set to an extreme value such as 500 or 1000. Each request forces vllm to generate and internally score hundreds of parallel completions before returning the best result. Within minutes, GPU VRAM is saturated and CPU threads are exhausted. The API stops responding to legitimate requests. Downstream applications — RAG pipelines, AI agents, user-facing chatbots — start throwing timeouts or errors. The attacker can sustain this with minimal tooling (a single curl loop), and the service does not self-recover without operator intervention.

Weaknesses (CWE)

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
September 17, 2024
Last Modified
September 20, 2024
First Seen
September 17, 2024

Related Vulnerabilities