CVE-2025-48956: vLLM: unauthenticated DoS via oversized HTTP header

GHSA-rxc4-3w6r-4v47 HIGH PoC AVAILABLE
Published August 21, 2025
CISO Take

Any vLLM deployment running versions 0.1.0 through 0.10.1.0 can be crashed with a single unauthenticated HTTP request — no credentials or AI knowledge required. Patch to 0.10.1.1 immediately; if not feasible, enforce HTTP header size limits at the reverse proxy or WAF layer before traffic reaches vLLM. This is a trivial attack against one of the most widely deployed LLM inference engines in production.

What is the risk?

High severity for organizations with exposed vLLM endpoints. Attack complexity is minimal — a single HTTP GET with an oversized header exhausts server memory, requiring no authentication or AI/ML expertise. While EPSS (0.37%) and absence from CISA KEV suggest limited active exploitation today, vLLM's ubiquity in AI production stacks makes this a high-value availability target. Internal-only deployments protected by network segmentation face materially lower risk, but most enterprise AI stacks expose vLLM behind an internal gateway reachable from broad employee networks.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
vLLM pip No patch
83.4K 130 dependents Pushed 3d ago 34% patched ~32d to patch Full package profile →
vLLM pip >= 0.1.0, < 0.10.1.1 0.10.1.1
83.4K 130 dependents Pushed 3d ago 34% patched ~32d to patch Full package profile →

How severe is it?

CVSS 3.1
7.5 / 10
EPSS
0.5%
chance of exploitation in 30 days
Higher than 40% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C None
I None
A High

What should I do?

5 steps
  1. Patch: Upgrade to vLLM 0.10.1.1 or later (patch commit d8b736f913a59117803d6701521d2e4861701944).

  2. Immediate workaround if patching is delayed: configure reverse proxy (nginx/Envoy/HAProxy) to enforce header size limits — e.g., nginx client_header_buffer_size 4k and large_client_header_buffers 4 8k — to reject oversized requests before they reach vLLM.

  3. Network controls: ensure vLLM HTTP ports (default 8000) are not exposed to untrusted networks; place behind an authenticated gateway or API gateway with rate limiting.

  4. Detection: alert on memory usage spikes in vLLM processes, unexpected service crashes, and HTTP 431 (Request Header Fields Too Large) responses at the proxy layer.

  5. Inventory: audit all internal vLLM deployments — shadow AI projects and dev/staging environments are frequently unpatched.

What does CISA's SSVC say?

Decision Track
Exploitation none
Automatable Yes
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
8.4 - AI system impact assessment
NIST AI RMF
MANAGE 2.2 - Mechanisms are in place to sustain AI risk and benefit management
OWASP LLM Top 10
LLM10 - Unbounded Consumption

Frequently Asked Questions

What is CVE-2025-48956?

Any vLLM deployment running versions 0.1.0 through 0.10.1.0 can be crashed with a single unauthenticated HTTP request — no credentials or AI knowledge required. Patch to 0.10.1.1 immediately; if not feasible, enforce HTTP header size limits at the reverse proxy or WAF layer before traffic reaches vLLM. This is a trivial attack against one of the most widely deployed LLM inference engines in production.

Is CVE-2025-48956 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-48956, increasing the risk of exploitation.

How to fix CVE-2025-48956?

1. Patch: Upgrade to vLLM 0.10.1.1 or later (patch commit d8b736f913a59117803d6701521d2e4861701944). 2. Immediate workaround if patching is delayed: configure reverse proxy (nginx/Envoy/HAProxy) to enforce header size limits — e.g., nginx client_header_buffer_size 4k and large_client_header_buffers 4 8k — to reject oversized requests before they reach vLLM. 3. Network controls: ensure vLLM HTTP ports (default 8000) are not exposed to untrusted networks; place behind an authenticated gateway or API gateway with rate limiting. 4. Detection: alert on memory usage spikes in vLLM processes, unexpected service crashes, and HTTP 431 (Request Header Fields Too Large) responses at the proxy layer. 5. Inventory: audit all internal vLLM deployments — shadow AI projects and dev/staging environments are frequently unpatched.

What systems are affected by CVE-2025-48956?

This vulnerability affects the following AI/ML architecture patterns: model serving, LLM inference endpoints, RAG pipelines, agent frameworks, AI APIs.

What is the CVSS score for CVE-2025-48956?

CVE-2025-48956 has a CVSS v3.1 base score of 7.5 (HIGH). The EPSS exploitation probability is 0.53%.

What is the AI security impact?

Affected AI Architectures

model servingLLM inference endpointsRAG pipelinesagent frameworksAI APIs

MITRE ATLAS Techniques

AML.T0029 Denial of AI Service
AML.T0034 Cost Harvesting
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: 8.4
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM10

What are the technical details?

Original Advisory

vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.10.1.1, a Denial of Service (DoS) vulnerability can be triggered by sending a single HTTP GET request with an extremely large header to an HTTP endpoint. This results in server memory exhaustion, potentially leading to a crash or unresponsiveness. The attack does not require authentication, making it exploitable by any remote user. This vulnerability is fixed in 0.10.1.1.

Exploitation Scenario

An adversary identifies an exposed vLLM HTTP endpoint via Shodan, internal network scan, or API documentation leak (default port 8000). They craft a single HTTP GET request to any vLLM endpoint — such as /v1/models or /health — with a multi-megabyte header (e.g., a Cookie or X-Custom-Header padded to tens of megabytes). vLLM processes the header without enforcing size bounds, allocating unbounded memory until the process exhausts available RAM and crashes or becomes unresponsive. All dependent AI services — chatbots, RAG pipelines, agent workflows — fail simultaneously with no failover unless replicas are deployed. No credentials, exploits, or AI knowledge required; a curl one-liner suffices.

Weaknesses (CWE)

CWE-400 — Uncontrolled Resource Consumption: The product does not properly control the allocation and maintenance of a limited resource.

  • [Architecture and Design] Design throttling mechanisms into the system architecture. The best protection is to limit the amount of resources that an unauthorized user can cause to be expended. A strong authentication and access control model will help prevent such attacks from occurring in the first place. The login application should be protected against DoS attacks as much as possible. Limiting the database access, perhaps by caching result sets, can help minimize the resources expended. To further limit the potential for a DoS attack, consider tracking the rate of requests received from users and blocking requests that exceed a defined rate threshold.
  • [Architecture and Design] Mitigation of resource exhaustion attacks requires that the target system either: The first of these solutions is an issue in itself though, since it may allow attackers to prevent the use of the system by a particular valid user. If the attacker impersonates the valid user, they may be able to prevent the user from accessing the server in question. The second solution is simply difficult to effectively institute -- and even when properly done, it does not provide a full solution. It simply makes the attack require more resources on the part of the attacker. recognizes the attack and denies that user further access for a given amount of time, or uniformly throttles all requests in order to make it more difficult to consume resources more quickly than they can again be freed.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
August 21, 2025
Last Modified
October 9, 2025
First Seen
August 21, 2025

Related Vulnerabilities