vLLM's speech-to-text endpoint validates upload size on compressed bytes but never caps the decoded output, so a single valid 25MB OPUS file expands to roughly 14.9GB of float32 PCM in memory — a textbook decompression bomb applied to audio. Any organization running vLLM ≤0.23.0 with the `/v1/audio/transcriptions` endpoint reachable by external or multi-tenant API users is exposed to availability loss across the entire inference server, not just the audio feature, since memory exhaustion crashes the shared process. The attack sits in the top 91st EPSS percentile and requires only low-privilege API access (`PR:L`), making it trivially reachable for any platform issuing API keys; three to five concurrent malicious uploads are enough to exhaust a typical deployment. Upgrade to vLLM v0.23.1rc0+ (PR #44970) immediately, or gate the audio endpoint behind strict network-level body size limits and per-user concurrency controls as a stop-gap.
What is the risk?
Operational risk exceeds the CVSS 6.5 medium rating for AI inference deployments. The 232x memory amplification ratio means bandwidth is not the bottleneck — a single attacker on a slow connection can trigger ~14.9GB of server-side allocation per request, and `np.concatenate` doubles the peak allocation with a second contiguous array. Multi-tenant vLLM deployments where the audio endpoint is exposed to API key holders are the highest-risk scenario: the `PR:L` requirement is trivially satisfied by trial or free-tier keys. With 130 downstream dependents and 61 prior CVEs in the same package, vLLM is a high-value infrastructure target for availability attacks against AI inference fleets.
How does the attack unfold?
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| vLLM | pip | <= 0.23.0 | No patch |
Do you use vLLM? You're affected.
How severe is it?
What is the attack surface?
What should I do?
6 steps-
Upgrade vLLM to v0.23.1rc0 or later — the fix (PR #44970, commit 1b1359c) adds a decoded-size budget check before
np.concatenate. -
If immediate upgrade is blocked, disable or firewall-restrict the
/v1/audio/transcriptionsendpoint at the reverse proxy level. -
Apply upstream request body limits (e.g., nginx
client_max_body_size 10m) independently of vLLM's internal check as defence-in-depth. -
Enforce per-user concurrency limits on audio upload endpoints to cap simultaneous decompression.
-
Set hard memory limits on vLLM containers (
--memoryin Docker or cgroups) to contain blast radius to a single pod. -
Monitor for sudden RSS spikes in vLLM containers as a detection signal — a single legitimate 30-second transcription should not cause gigabyte-scale allocation.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2026-54233?
vLLM's speech-to-text endpoint validates upload size on compressed bytes but never caps the decoded output, so a single valid 25MB OPUS file expands to roughly 14.9GB of float32 PCM in memory — a textbook decompression bomb applied to audio. Any organization running vLLM ≤0.23.0 with the `/v1/audio/transcriptions` endpoint reachable by external or multi-tenant API users is exposed to availability loss across the entire inference server, not just the audio feature, since memory exhaustion crashes the shared process. The attack sits in the top 91st EPSS percentile and requires only low-privilege API access (`PR:L`), making it trivially reachable for any platform issuing API keys; three to five concurrent malicious uploads are enough to exhaust a typical deployment. Upgrade to vLLM v0.23.1rc0+ (PR #44970) immediately, or gate the audio endpoint behind strict network-level body size limits and per-user concurrency controls as a stop-gap.
Is CVE-2026-54233 actively exploited?
No confirmed active exploitation of CVE-2026-54233 has been reported, but organizations should still patch proactively.
How to fix CVE-2026-54233?
1. Upgrade vLLM to v0.23.1rc0 or later — the fix (PR #44970, commit 1b1359c) adds a decoded-size budget check before `np.concatenate`. 2. If immediate upgrade is blocked, disable or firewall-restrict the `/v1/audio/transcriptions` endpoint at the reverse proxy level. 3. Apply upstream request body limits (e.g., nginx `client_max_body_size 10m`) independently of vLLM's internal check as defence-in-depth. 4. Enforce per-user concurrency limits on audio upload endpoints to cap simultaneous decompression. 5. Set hard memory limits on vLLM containers (`--memory` in Docker or cgroups) to contain blast radius to a single pod. 6. Monitor for sudden RSS spikes in vLLM containers as a detection signal — a single legitimate 30-second transcription should not cause gigabyte-scale allocation.
What systems are affected by CVE-2026-54233?
This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, multimodal AI pipelines, speech-to-text services, model serving.
What is the CVSS score for CVE-2026-54233?
CVE-2026-54233 has a CVSS v3.1 base score of 6.5 (MEDIUM). The EPSS exploitation probability is 0.03%.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0029 Denial of AI Service AML.T0034.001 Resource-Intensive Queries AML.T0049 Exploit Public-Facing Application Compliance Controls Affected
What are the technical details?
Original Advisory
### Summary vLLM's `/v1/audio/transcriptions` endpoint limits compressed upload size but not decoded PCM output. A 25MB OPUS file expands to ~14.9GB of float32 PCM at decode time. Tested on vLLM v0.19.0. ### Details `SpeechToTextProcessor` rejects uploads over `VLLM_MAX_AUDIO_CLIP_FILESIZE_MB` (default 25MB) based on compressed byte length, but the audio decoder in `audio.py` accumulates all decoded frames into memory with no size limit before returning: ```python # speech_to_text.py L184-189 if len(audio_data) / 1024 ** 2 > self.max_audio_filesize_mb: raise VLLMValidationError(...) y, sr = load_audio(buf, sr=self.asr_config.sample_rate) # decoded size unchecked # audio.py L77-107 chunks: list[npt.NDArray] = [] for frame in container.decode(stream): chunks.append(frame.to_ndarray()) audio = np.concatenate(chunks, axis=-1).astype(np.float32) # single contiguous allocation ``` A 25MB OPUS file at 6kbps encodes ~8.7 hours of audio. Decoding produces ~5.7GB of float32 PCM (232x amplification), and `np.concatenate` then allocates a second contiguous array, bringing peak RSS to ~14.9GB from a single request. `SpeechToTextConfig.max_audio_clip_s` (default 30s) applies only after the full decode and does not prevent the allocation. ### Impact An unauthenticated attacker can exhaust server memory with a small number of concurrent requests, each a valid upload within the documented size limit. Severity was assessed with reference to prior OOM vulnerability reports in vLLM. ### Fix A fix for this vulnerability was merged here: https://github.com/vllm-project/vllm/pull/44970
Exploitation Scenario
An attacker registers or purchases low-tier API access to a multi-tenant vLLM platform. They craft a ~25MB OPUS file encoding approximately 8.7 hours of low-complexity audio at 6kbps — valid by the documented upload limit. They submit five concurrent POST requests to `/v1/audio/transcriptions`. Each request passes the compressed-byte size check, enters `load_audio()`, and accumulates decoded PCM frames into memory; `np.concatenate` then allocates a second contiguous ~5.7GB array, driving peak RSS to ~14.9GB per request. Five concurrent requests push the vLLM process past available RAM, triggering OOM termination within seconds and denying inference service to all users on the shared instance.
Weaknesses (CWE)
CWE-409 — Improper Handling of Highly Compressed Data (Data Amplification): The product does not handle or incorrectly handles a compressed input with a very high compression ratio that produces a large output.
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H References
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm