CVE-2025-48956 — HIGH (CVSS 7.5) AI Security Vulnerability

Q: Is CVE-2025-48956 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-48956, increasing the risk of exploitation.

Q: How to fix CVE-2025-48956?

1. Patch: Upgrade to vLLM 0.10.1.1 or later (patch commit d8b736f913a59117803d6701521d2e4861701944). 2. Immediate workaround if patching is delayed: configure reverse proxy (nginx/Envoy/HAProxy) to enforce header size limits — e.g., nginx client_header_buffer_size 4k and large_client_header_buffers 4 8k — to reject oversized requests before they reach vLLM. 3. Network controls: ensure vLLM HTTP ports (default 8000) are not exposed to untrusted networks; place behind an authenticated gateway or API gateway with rate limiting. 4. Detection: alert on memory usage spikes in vLLM processes, unexpected service crashes, and HTTP 431 (Request Header Fields Too Large) responses at the proxy layer. 5. Inventory: audit all internal vLLM deployments — shadow AI projects and dev/staging environments are frequently unpatched.

Q: What systems are affected by CVE-2025-48956?

This vulnerability affects the following AI/ML architecture patterns: model serving, LLM inference endpoints, RAG pipelines, agent frameworks, AI APIs.

Q: What is the CVSS score for CVE-2025-48956?

CVE-2025-48956 has a CVSS v3.1 base score of 7.5 (HIGH). The EPSS exploitation probability is 0.31%.

CISO Take

Any vLLM deployment running versions 0.1.0 through 0.10.1.0 can be crashed with a single unauthenticated HTTP request — no credentials or AI knowledge required. Patch to 0.10.1.1 immediately; if not feasible, enforce HTTP header size limits at the reverse proxy or WAF layer before traffic reaches vLLM. This is a trivial attack against one of the most widely deployed LLM inference engines in production.

Risk Assessment

High severity for organizations with exposed vLLM endpoints. Attack complexity is minimal — a single HTTP GET with an oversized header exhausts server memory, requiring no authentication or AI/ML expertise. While EPSS (0.37%) and absence from CISA KEV suggest limited active exploitation today, vLLM's ubiquity in AI production stacks makes this a high-value availability target. Internal-only deployments protected by network segmentation face materially lower risk, but most enterprise AI stacks expose vLLM behind an internal gateway reachable from broad employee networks.

Affected Systems

Package	Ecosystem	Vulnerable Range	Patched
vllm	pip	—	No patch
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →
vllm	pip	>= 0.1.0, < 0.10.1.1	`0.10.1.1`
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →

Severity & Risk

CVSS 3.1

7.5 / 10

EPSS

0.3%

chance of exploitation in 30 days

Higher than 54% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Trivial

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV Network

AC Low

PR None

UI None

S Unchanged

C None

I None

A High

Recommended Action

5 steps

Patch: Upgrade to vLLM 0.10.1.1 or later (patch commit d8b736f913a59117803d6701521d2e4861701944).
Immediate workaround if patching is delayed: configure reverse proxy (nginx/Envoy/HAProxy) to enforce header size limits — e.g., nginx client_header_buffer_size 4k and large_client_header_buffers 4 8k — to reject oversized requests before they reach vLLM.
Network controls: ensure vLLM HTTP ports (default 8000) are not exposed to untrusted networks; place behind an authenticated gateway or API gateway with rate limiting.
Detection: alert on memory usage spikes in vLLM processes, unexpected service crashes, and HTTP 431 (Request Header Fields Too Large) responses at the proxy layer.
Inventory: audit all internal vLLM deployments — shadow AI projects and dev/staging environments are frequently unpatched.

CISA SSVC Assessment

Decision Track

Exploitation none

Automatable Yes

Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

DoS Inference API Framework AML.T0029 - Denial of AI Service AML.T0034 - Cost Harvesting AML.T0049 - Exploit Public-Facing Application

Compliance Impact

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

8.4 - AI system impact assessment

NIST AI RMF

MANAGE 2.2 - Mechanisms are in place to sustain AI risk and benefit management

OWASP LLM Top 10

LLM10 - Unbounded Consumption

Frequently Asked Questions

What is CVE-2025-48956?

Any vLLM deployment running versions 0.1.0 through 0.10.1.0 can be crashed with a single unauthenticated HTTP request — no credentials or AI knowledge required. Patch to 0.10.1.1 immediately; if not feasible, enforce HTTP header size limits at the reverse proxy or WAF layer before traffic reaches vLLM. This is a trivial attack against one of the most widely deployed LLM inference engines in production.

Is CVE-2025-48956 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-48956, increasing the risk of exploitation.

How to fix CVE-2025-48956?

1. Patch: Upgrade to vLLM 0.10.1.1 or later (patch commit d8b736f913a59117803d6701521d2e4861701944). 2. Immediate workaround if patching is delayed: configure reverse proxy (nginx/Envoy/HAProxy) to enforce header size limits — e.g., nginx client_header_buffer_size 4k and large_client_header_buffers 4 8k — to reject oversized requests before they reach vLLM. 3. Network controls: ensure vLLM HTTP ports (default 8000) are not exposed to untrusted networks; place behind an authenticated gateway or API gateway with rate limiting. 4. Detection: alert on memory usage spikes in vLLM processes, unexpected service crashes, and HTTP 431 (Request Header Fields Too Large) responses at the proxy layer. 5. Inventory: audit all internal vLLM deployments — shadow AI projects and dev/staging environments are frequently unpatched.

What systems are affected by CVE-2025-48956?

This vulnerability affects the following AI/ML architecture patterns: model serving, LLM inference endpoints, RAG pipelines, agent frameworks, AI APIs.

What is the CVSS score for CVE-2025-48956?

CVE-2025-48956 has a CVSS v3.1 base score of 7.5 (HIGH). The EPSS exploitation probability is 0.31%.

Technical Details

NVD Description

vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.10.1.1, a Denial of Service (DoS) vulnerability can be triggered by sending a single HTTP GET request with an extremely large header to an HTTP endpoint. This results in server memory exhaustion, potentially leading to a crash or unresponsiveness. The attack does not require authentication, making it exploitable by any remote user. This vulnerability is fixed in 0.10.1.1.

Exploitation Scenario

An adversary identifies an exposed vLLM HTTP endpoint via Shodan, internal network scan, or API documentation leak (default port 8000). They craft a single HTTP GET request to any vLLM endpoint — such as /v1/models or /health — with a multi-megabyte header (e.g., a Cookie or X-Custom-Header padded to tens of megabytes). vLLM processes the header without enforcing size bounds, allocating unbounded memory until the process exhausts available RAM and crashes or becomes unresponsive. All dependent AI services — chatbots, RAG pipelines, agent workflows — fail simultaneously with no failover unless replicas are deployed. No credentials, exploits, or AI knowledge required; a curl one-liner suffices.