CVE-2024-8768: vLLM: unauthenticated DoS via empty completion prompt
HIGH PoC AVAILABLE CISA: TRACK*Any vLLM inference server accessible over the network can be crashed with a single malformed request — no credentials required. If your AI stack uses vLLM for model serving (directly or via an agent framework), patch immediately to the fixed version or add upstream input validation to block empty prompt payloads. This is trivially weaponizable as an availability attack against production inference endpoints.
What is the risk?
High risk for organizations running exposed vLLM inference APIs. CVSS 7.5 with AV:N/AC:L/PR:N/UI:N means any network-reachable instance is exploitable by an unauthenticated attacker with zero skill. vLLM is widely deployed as the inference backend for production LLM APIs, agent frameworks, and RAG pipelines, significantly broadening the attack surface. No active KEV listing but trivial reproducibility makes exploitation near-certain against unpatched instances.
How severe is it?
What is the attack surface?
What should I do?
5 steps-
PATCH
Upgrade vLLM to the version containing PR #7746 fix immediately. Check vLLM release notes for the patched version tag.
-
WORKAROUND (if patching is delayed): Deploy an API gateway or reverse proxy (nginx, Envoy, AWS API Gateway) upstream of vLLM that validates prompt fields are non-empty before forwarding requests. A simple 400-response rule on empty/null prompt body is sufficient.
-
NETWORK CONTROLS
Restrict vLLM API access to authenticated internal services only. vLLM should never be internet-facing without an auth layer.
-
DETECTION
Alert on HTTP 500 responses from vLLM endpoints and on process restart events for the vLLM server process. Repeated 500s from the same source IP are a strong signal.
-
RATE LIMITING
Implement per-client rate limiting at the API gateway layer to limit blast radius from abuse.
What does CISA's SSVC say?
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2024-8768?
Any vLLM inference server accessible over the network can be crashed with a single malformed request — no credentials required. If your AI stack uses vLLM for model serving (directly or via an agent framework), patch immediately to the fixed version or add upstream input validation to block empty prompt payloads. This is trivially weaponizable as an availability attack against production inference endpoints.
Is CVE-2024-8768 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2024-8768, increasing the risk of exploitation.
How to fix CVE-2024-8768?
1. PATCH: Upgrade vLLM to the version containing PR #7746 fix immediately. Check vLLM release notes for the patched version tag. 2. WORKAROUND (if patching is delayed): Deploy an API gateway or reverse proxy (nginx, Envoy, AWS API Gateway) upstream of vLLM that validates prompt fields are non-empty before forwarding requests. A simple 400-response rule on empty/null prompt body is sufficient. 3. NETWORK CONTROLS: Restrict vLLM API access to authenticated internal services only. vLLM should never be internet-facing without an auth layer. 4. DETECTION: Alert on HTTP 500 responses from vLLM endpoints and on process restart events for the vLLM server process. Repeated 500s from the same source IP are a strong signal. 5. RATE LIMITING: Implement per-client rate limiting at the API gateway layer to limit blast radius from abuse.
What systems are affected by CVE-2024-8768?
This vulnerability affects the following AI/ML architecture patterns: LLM inference endpoints, model serving, agent frameworks, RAG pipelines, AI API gateways.
What is the CVSS score for CVE-2024-8768?
CVE-2024-8768 has a CVSS v3.1 base score of 7.5 (HIGH). The EPSS exploitation probability is 0.65%.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0029 Denial of AI Service AML.T0034 Cost Harvesting AML.T0040 AI Model Inference API Access AML.T0049 Exploit Public-Facing Application Compliance Controls Affected
What are the technical details?
Original Advisory
A flaw was found in the vLLM library. A completions API request with an empty prompt will crash the vLLM API server, resulting in a denial of service.
Exploitation Scenario
An adversary identifies a vLLM-powered inference endpoint (via DNS enumeration, Shodan, or internal network scanning). They send a POST request to /v1/completions with an empty string or null prompt field. The server triggers CWE-617 (reachable assertion failure) and crashes. The attacker can automate this in a loop to prevent service recovery. For organizations where vLLM backs a customer-facing AI product or internal copilot, this results in immediate and sustained availability outage. The attack requires no authentication, no AI/ML knowledge, and no special tooling — a simple curl command is sufficient.
Weaknesses (CWE)
CWE-617 — Reachable Assertion: The product contains an assert() or similar statement that can be triggered by an attacker, which leads to an application exit or other behavior that is more severe than necessary.
- [Implementation] Make sensitive open/close operation non reachable by directly user-controlled data (e.g. open/close resources)
- [Implementation] Perform input validation on user data.
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H References
Timeline
Related Vulnerabilities
CVE-2026-33660 10.0 TensorFlow: type confusion NPD in tensor conversion
Same attack type: DoS CVE-2023-25668 9.8 TensorFlow: unauthenticated RCE via heap buffer overflow
Same attack type: DoS CVE-2022-23587 9.8 TensorFlow: integer overflow in Grappler enables RCE
Same attack type: DoS CVE-2022-35939 9.8 TensorFlow: ScatterNd OOB write enables RCE/crash
Same attack type: DoS CVE-2022-41900 9.8 TensorFlow: heap OOB RCE in FractionalMaxPool op
Same attack type: DoS