CVE-2025-48956: vLLM: unauthenticated DoS via oversized HTTP header
GHSA-rxc4-3w6r-4v47 HIGH PoC AVAILABLEAny vLLM deployment running versions 0.1.0 through 0.10.1.0 can be crashed with a single unauthenticated HTTP request — no credentials or AI knowledge required. Patch to 0.10.1.1 immediately; if not feasible, enforce HTTP header size limits at the reverse proxy or WAF layer before traffic reaches vLLM. This is a trivial attack against one of the most widely deployed LLM inference engines in production.
Risk Assessment
High severity for organizations with exposed vLLM endpoints. Attack complexity is minimal — a single HTTP GET with an oversized header exhausts server memory, requiring no authentication or AI/ML expertise. While EPSS (0.37%) and absence from CISA KEV suggest limited active exploitation today, vLLM's ubiquity in AI production stacks makes this a high-value availability target. Internal-only deployments protected by network segmentation face materially lower risk, but most enterprise AI stacks expose vLLM behind an internal gateway reachable from broad employee networks.
Affected Systems
Severity & Risk
Attack Surface
Recommended Action
5 steps-
Patch: Upgrade to vLLM 0.10.1.1 or later (patch commit d8b736f913a59117803d6701521d2e4861701944).
-
Immediate workaround if patching is delayed: configure reverse proxy (nginx/Envoy/HAProxy) to enforce header size limits — e.g., nginx client_header_buffer_size 4k and large_client_header_buffers 4 8k — to reject oversized requests before they reach vLLM.
-
Network controls: ensure vLLM HTTP ports (default 8000) are not exposed to untrusted networks; place behind an authenticated gateway or API gateway with rate limiting.
-
Detection: alert on memory usage spikes in vLLM processes, unexpected service crashes, and HTTP 431 (Request Header Fields Too Large) responses at the proxy layer.
-
Inventory: audit all internal vLLM deployments — shadow AI projects and dev/staging environments are frequently unpatched.
CISA SSVC Assessment
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2025-48956?
Any vLLM deployment running versions 0.1.0 through 0.10.1.0 can be crashed with a single unauthenticated HTTP request — no credentials or AI knowledge required. Patch to 0.10.1.1 immediately; if not feasible, enforce HTTP header size limits at the reverse proxy or WAF layer before traffic reaches vLLM. This is a trivial attack against one of the most widely deployed LLM inference engines in production.
Is CVE-2025-48956 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2025-48956, increasing the risk of exploitation.
How to fix CVE-2025-48956?
1. Patch: Upgrade to vLLM 0.10.1.1 or later (patch commit d8b736f913a59117803d6701521d2e4861701944). 2. Immediate workaround if patching is delayed: configure reverse proxy (nginx/Envoy/HAProxy) to enforce header size limits — e.g., nginx client_header_buffer_size 4k and large_client_header_buffers 4 8k — to reject oversized requests before they reach vLLM. 3. Network controls: ensure vLLM HTTP ports (default 8000) are not exposed to untrusted networks; place behind an authenticated gateway or API gateway with rate limiting. 4. Detection: alert on memory usage spikes in vLLM processes, unexpected service crashes, and HTTP 431 (Request Header Fields Too Large) responses at the proxy layer. 5. Inventory: audit all internal vLLM deployments — shadow AI projects and dev/staging environments are frequently unpatched.
What systems are affected by CVE-2025-48956?
This vulnerability affects the following AI/ML architecture patterns: model serving, LLM inference endpoints, RAG pipelines, agent frameworks, AI APIs.
What is the CVSS score for CVE-2025-48956?
CVE-2025-48956 has a CVSS v3.1 base score of 7.5 (HIGH). The EPSS exploitation probability is 0.31%.
Technical Details
NVD Description
vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.10.1.1, a Denial of Service (DoS) vulnerability can be triggered by sending a single HTTP GET request with an extremely large header to an HTTP endpoint. This results in server memory exhaustion, potentially leading to a crash or unresponsiveness. The attack does not require authentication, making it exploitable by any remote user. This vulnerability is fixed in 0.10.1.1.
Exploitation Scenario
An adversary identifies an exposed vLLM HTTP endpoint via Shodan, internal network scan, or API documentation leak (default port 8000). They craft a single HTTP GET request to any vLLM endpoint — such as /v1/models or /health — with a multi-megabyte header (e.g., a Cookie or X-Custom-Header padded to tens of megabytes). vLLM processes the header without enforcing size bounds, allocating unbounded memory until the process exhausts available RAM and crashes or becomes unresponsive. All dependent AI services — chatbots, RAG pipelines, agent workflows — fail simultaneously with no failover unless replicas are deployed. No credentials, exploits, or AI knowledge required; a curl one-liner suffices.
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H References
- github.com/advisories/GHSA-rxc4-3w6r-4v47
- nvd.nist.gov/vuln/detail/CVE-2025-48956
- github.com/vllm-project/vllm/commit/d8b736f913a59117803d6701521d2e4861701944 Patch
- github.com/vllm-project/vllm/pull/23267 Issue Patch
- github.com/vllm-project/vllm/security/advisories/GHSA-rxc4-3w6r-4v47 Vendor
- github.com/fkie-cad/nvd-json-data-feeds Exploit
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm
AI Threat Alert