CVE-2025-48943 — MEDIUM (CVSS 6.5) AI Security Vulnerability

CISO Take

Any authenticated user of vLLM 0.8.x can crash the entire inference server by submitting a malformed regex as a structured output constraint — no special skills required. This is a shared-resource risk: one bad request takes down inference for all downstream services and agents. Upgrade to vLLM 0.9.0 immediately; if blocked, add regex validation at the API gateway layer.

Risk Assessment

Operationally higher risk than CVSS 6.5 suggests for shared inference infrastructure. Low Complexity + Low Privileges means any authenticated API consumer — including internal developers, CI pipelines, or compromised service accounts — can trigger a full server crash with a single request. EPSS is negligible (0.00084), indicating no observed exploitation yet, but the attack is trivially reproducible once the advisory is public. The sibling vulnerability CVE-2025-48942 (same pattern, JSON schema instead of regex) confirms a systemic input validation gap in vLLM's structured output subsystem.

Affected Systems

Package	Ecosystem	Vulnerable Range	Patched
vllm	pip	—	No patch
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →
vllm	pip	>= 0.8.0, < 0.9.0	`0.9.0`
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →

Severity & Risk

CVSS 3.1

6.5 / 10

EPSS

0.2%

chance of exploitation in 30 days

Higher than 47% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Trivial

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV Network

AC Low

PR Low

UI None

S Unchanged

C None

I None

A High

Recommended Action

6 steps

PATCH

Upgrade vLLM to >= 0.9.0 immediately (pip install --upgrade vllm).
WORKAROUND

If upgrade is blocked, add a pre-validation layer that sanitizes regex patterns before forwarding to vLLM — reject patterns exceeding complexity thresholds (e.g., nested quantifiers).
RATE-LIMIT: Apply per-user rate limiting on structured output endpoints to slow down brute-force crash attempts.
DETECT

Monitor for sudden vLLM process restarts or spikes in 5xx errors on structured output endpoints. Alert on repeated server crashes from the same API key/user.
ISOLATE

Run vLLM behind an internal-only API gateway; avoid exposing the inference API directly to untrusted users.
AUDIT

Review whether structured output endpoints are exposed to external or low-trust identities.

CISA SSVC Assessment

Decision Track

Exploitation none

Automatable No

Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

DoS Inference Framework AML.T0029 - Denial of AI Service AML.T0034 - Cost Harvesting AML.T0040 - AI Model Inference API Access AML.T0049 - Exploit Public-Facing Application

Compliance Impact

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.6.2.8 - AI system availability and performance

NIST AI RMF

GOVERN-1.7 - Processes for AI risk identification and response MANAGE-2.2 - Mechanisms to sustain AI system value and manage risks over time

OWASP LLM Top 10

LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2025-48943?

Any authenticated user of vLLM 0.8.x can crash the entire inference server by submitting a malformed regex as a structured output constraint — no special skills required. This is a shared-resource risk: one bad request takes down inference for all downstream services and agents. Upgrade to vLLM 0.9.0 immediately; if blocked, add regex validation at the API gateway layer.

Is CVE-2025-48943 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-48943, increasing the risk of exploitation.

How to fix CVE-2025-48943?

1. PATCH: Upgrade vLLM to >= 0.9.0 immediately (pip install --upgrade vllm). 2. WORKAROUND: If upgrade is blocked, add a pre-validation layer that sanitizes regex patterns before forwarding to vLLM — reject patterns exceeding complexity thresholds (e.g., nested quantifiers). 3. RATE-LIMIT: Apply per-user rate limiting on structured output endpoints to slow down brute-force crash attempts. 4. DETECT: Monitor for sudden vLLM process restarts or spikes in 5xx errors on structured output endpoints. Alert on repeated server crashes from the same API key/user. 5. ISOLATE: Run vLLM behind an internal-only API gateway; avoid exposing the inference API directly to untrusted users. 6. AUDIT: Review whether structured output endpoints are exposed to external or low-trust identities.

What systems are affected by CVE-2025-48943?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, structured output pipelines, AI agent frameworks, multi-tenant LLM API gateways, RAG pipelines with constrained generation.

What is the CVSS score for CVE-2025-48943?

CVE-2025-48943 has a CVSS v3.1 base score of 6.5 (MEDIUM). The EPSS exploitation probability is 0.24%.

Technical Details

NVD Description

vLLM is an inference and serving engine for large language models (LLMs). Version 0.8.0 up to but excluding 0.9.0 have a Denial of Service (ReDoS) that causes the vLLM server to crash if an invalid regex was provided while using structured output. This vulnerability is similar to GHSA-6qc9-v4r8-22xg/CVE-2025-48942, but for regex instead of a JSON schema. Version 0.9.0 fixes the issue.

Exploitation Scenario

An adversary with low-privilege API access (e.g., a developer API key, a compromised service account, or a malicious internal user) sends a POST request to the vLLM inference endpoint with a carefully crafted regex pattern in the guided generation parameters — such as a pattern with catastrophic backtracking like `(a+)+$`. The vLLM server attempts to compile and validate the regex, triggering exponential backtracking that consumes all CPU and crashes the process. All concurrent inference requests fail. In a Kubernetes environment without proper liveness probes, the pod may hang rather than restart, causing a prolonged outage. An adversary can repeat this attack to maintain denial of service against a critical LLM serving layer.