CVE-2025-48956: vLLM: unauthenticated DoS via oversized HTTP header

GHSA-rxc4-3w6r-4v47 HIGH PoC AVAILABLE
Published August 21, 2025
CISO Take

Any vLLM deployment running versions 0.1.0 through 0.10.1.0 can be crashed with a single unauthenticated HTTP request — no credentials or AI knowledge required. Patch to 0.10.1.1 immediately; if not feasible, enforce HTTP header size limits at the reverse proxy or WAF layer before traffic reaches vLLM. This is a trivial attack against one of the most widely deployed LLM inference engines in production.

Risk Assessment

High severity for organizations with exposed vLLM endpoints. Attack complexity is minimal — a single HTTP GET with an oversized header exhausts server memory, requiring no authentication or AI/ML expertise. While EPSS (0.37%) and absence from CISA KEV suggest limited active exploitation today, vLLM's ubiquity in AI production stacks makes this a high-value availability target. Internal-only deployments protected by network segmentation face materially lower risk, but most enterprise AI stacks expose vLLM behind an internal gateway reachable from broad employee networks.

Affected Systems

Package Ecosystem Vulnerable Range Patched
vllm pip No patch
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →
vllm pip >= 0.1.0, < 0.10.1.1 0.10.1.1
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →

Severity & Risk

CVSS 3.1
7.5 / 10
EPSS
0.3%
chance of exploitation in 30 days
Higher than 54% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C None
I None
A High

Recommended Action

5 steps
  1. Patch: Upgrade to vLLM 0.10.1.1 or later (patch commit d8b736f913a59117803d6701521d2e4861701944).

  2. Immediate workaround if patching is delayed: configure reverse proxy (nginx/Envoy/HAProxy) to enforce header size limits — e.g., nginx client_header_buffer_size 4k and large_client_header_buffers 4 8k — to reject oversized requests before they reach vLLM.

  3. Network controls: ensure vLLM HTTP ports (default 8000) are not exposed to untrusted networks; place behind an authenticated gateway or API gateway with rate limiting.

  4. Detection: alert on memory usage spikes in vLLM processes, unexpected service crashes, and HTTP 431 (Request Header Fields Too Large) responses at the proxy layer.

  5. Inventory: audit all internal vLLM deployments — shadow AI projects and dev/staging environments are frequently unpatched.

CISA SSVC Assessment

Decision Track
Exploitation none
Automatable Yes
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
8.4 - AI system impact assessment
NIST AI RMF
MANAGE 2.2 - Mechanisms are in place to sustain AI risk and benefit management
OWASP LLM Top 10
LLM10 - Unbounded Consumption

Frequently Asked Questions

What is CVE-2025-48956?

Any vLLM deployment running versions 0.1.0 through 0.10.1.0 can be crashed with a single unauthenticated HTTP request — no credentials or AI knowledge required. Patch to 0.10.1.1 immediately; if not feasible, enforce HTTP header size limits at the reverse proxy or WAF layer before traffic reaches vLLM. This is a trivial attack against one of the most widely deployed LLM inference engines in production.

Is CVE-2025-48956 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-48956, increasing the risk of exploitation.

How to fix CVE-2025-48956?

1. Patch: Upgrade to vLLM 0.10.1.1 or later (patch commit d8b736f913a59117803d6701521d2e4861701944). 2. Immediate workaround if patching is delayed: configure reverse proxy (nginx/Envoy/HAProxy) to enforce header size limits — e.g., nginx client_header_buffer_size 4k and large_client_header_buffers 4 8k — to reject oversized requests before they reach vLLM. 3. Network controls: ensure vLLM HTTP ports (default 8000) are not exposed to untrusted networks; place behind an authenticated gateway or API gateway with rate limiting. 4. Detection: alert on memory usage spikes in vLLM processes, unexpected service crashes, and HTTP 431 (Request Header Fields Too Large) responses at the proxy layer. 5. Inventory: audit all internal vLLM deployments — shadow AI projects and dev/staging environments are frequently unpatched.

What systems are affected by CVE-2025-48956?

This vulnerability affects the following AI/ML architecture patterns: model serving, LLM inference endpoints, RAG pipelines, agent frameworks, AI APIs.

What is the CVSS score for CVE-2025-48956?

CVE-2025-48956 has a CVSS v3.1 base score of 7.5 (HIGH). The EPSS exploitation probability is 0.31%.

Technical Details

NVD Description

vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.10.1.1, a Denial of Service (DoS) vulnerability can be triggered by sending a single HTTP GET request with an extremely large header to an HTTP endpoint. This results in server memory exhaustion, potentially leading to a crash or unresponsiveness. The attack does not require authentication, making it exploitable by any remote user. This vulnerability is fixed in 0.10.1.1.

Exploitation Scenario

An adversary identifies an exposed vLLM HTTP endpoint via Shodan, internal network scan, or API documentation leak (default port 8000). They craft a single HTTP GET request to any vLLM endpoint — such as /v1/models or /health — with a multi-megabyte header (e.g., a Cookie or X-Custom-Header padded to tens of megabytes). vLLM processes the header without enforcing size bounds, allocating unbounded memory until the process exhausts available RAM and crashes or becomes unresponsive. All dependent AI services — chatbots, RAG pipelines, agent workflows — fail simultaneously with no failover unless replicas are deployed. No credentials, exploits, or AI knowledge required; a curl one-liner suffices.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
August 21, 2025
Last Modified
October 9, 2025
First Seen
August 21, 2025

Related Vulnerabilities