CVE-2024-8939 — MEDIUM (CVSS 6.2) AI Security Vulnerability

CISO Take

An attacker with local API access can crash your vllm inference server by sending requests with an inflated best_of parameter, consuming all compute resources until the service becomes unresponsive. Patch ilab immediately if running model serve in any shared or multi-tenant environment. If patching is not immediate, cap best_of input at the API gateway and enforce per-request timeouts.

Risk Assessment

Despite a medium CVSS (6.2), the local attack vector (AV:L) narrows real-world exposure to contexts where the vllm API is reachable from localhost or an internal network. In containerized AI serving clusters, shared dev boxes, or misconfigured Kubernetes namespaces, that boundary is often non-existent. Attack complexity is low, no credentials are required, and the exploitation path is trivial — a single crafted request can trigger cascading resource exhaustion. Availability loss on inference infrastructure is high-impact in any production LLM serving stack.

Severity & Risk

CVSS 3.1

6.2 / 10

EPSS

0.0%

chance of exploitation in 30 days

Higher than 7% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Trivial

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV Local

AC Low

PR None

UI None

S Unchanged

C None

I None

A High

Recommended Action

6 steps

Apply the Red Hat patch referenced in the advisory (bugzilla.redhat.com/2312782).
Enforce a hard cap on best_of at the API gateway or reverse proxy layer — reject any request with best_of > 5.
Configure per-request inference timeouts at the vllm level to bound maximum resource consumption.
Apply rate limiting per client IP or API token on all inference endpoints.
Restrict network access to the vllm API to authorized internal clients only (firewall or network policy).
Alert on sustained CPU/GPU utilization spikes above baseline on inference nodes as an early DoS detection signal.

CISA SSVC Assessment

Decision Track

Exploitation none

Automatable No

Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

DoS Inference API AML.T0029 - Denial of AI Service AML.T0034 - Cost Harvesting AML.T0040 - AI Model Inference API Access

Compliance Impact

This CVE is relevant to:

EU AI Act

Art. 9 - Risk Management System

ISO 42001

A.9.2 - AI System Availability and Resilience

NIST AI RMF

MANAGE-2.2 - Mechanisms for sustaining value of deployed AI systems

OWASP LLM Top 10

LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2024-8939?

An attacker with local API access can crash your vllm inference server by sending requests with an inflated best_of parameter, consuming all compute resources until the service becomes unresponsive. Patch ilab immediately if running model serve in any shared or multi-tenant environment. If patching is not immediate, cap best_of input at the API gateway and enforce per-request timeouts.

Is CVE-2024-8939 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2024-8939, increasing the risk of exploitation.

How to fix CVE-2024-8939?

1. Apply the Red Hat patch referenced in the advisory (bugzilla.redhat.com/2312782). 2. Enforce a hard cap on best_of at the API gateway or reverse proxy layer — reject any request with best_of > 5. 3. Configure per-request inference timeouts at the vllm level to bound maximum resource consumption. 4. Apply rate limiting per client IP or API token on all inference endpoints. 5. Restrict network access to the vllm API to authorized internal clients only (firewall or network policy). 6. Alert on sustained CPU/GPU utilization spikes above baseline on inference nodes as an early DoS detection signal.

What systems are affected by CVE-2024-8939?

This vulnerability affects the following AI/ML architecture patterns: model serving, LLM inference endpoints, RAG pipelines, agent frameworks.

What is the CVSS score for CVE-2024-8939?

CVE-2024-8939 has a CVSS v3.1 base score of 6.2 (MEDIUM). The EPSS exploitation probability is 0.03%.

Technical Details

NVD Description

A vulnerability was found in the ilab model serve component, where improper handling of the best_of parameter in the vllm JSON web API can lead to a Denial of Service (DoS). The API used for LLM-based sentence or chat completion accepts a best_of parameter to return the best completion from several options. When this parameter is set to a large value, the API does not handle timeouts or resource exhaustion properly, allowing an attacker to cause a DoS by consuming excessive system resources. This leads to the API becoming unresponsive, preventing legitimate users from accessing the service.

Exploitation Scenario

An attacker with access to the internal network or a compromised container sharing the namespace sends repeated POST requests to the vllm /v1/chat/completions endpoint with best_of set to an extreme value such as 500 or 1000. Each request forces vllm to generate and internally score hundreds of parallel completions before returning the best result. Within minutes, GPU VRAM is saturated and CPU threads are exhausted. The API stops responding to legitimate requests. Downstream applications — RAG pipelines, AI agents, user-facing chatbots — start throwing timeouts or errors. The attacker can sustain this with minimal tooling (a single curl loop), and the service does not self-recover without operator intervention.