A validation bypass in vLLM's temperature parameter handling allows any caller with API access to crash GPU inference workers by passing NaN or positive Infinity as the temperature value — Python's IEEE 754 float semantics cause all comparison guards in sampling_params.py to silently evaluate to False, letting the invalid value reach CUDA kernels where it triggers undefined behavior. With 130 downstream dependents and an EPSS score placing this vulnerability in the 88th percentile for exploitation likelihood, any vLLM-based serving infrastructure — including self-hosted LLM APIs, multi-tenant inference platforms, and derivative middleware — faces service disruption from a single malformed request. No public exploit or KEV listing exists yet, but the attack requires no special knowledge beyond knowing the parameter name and is effective against all versions up to 0.23.0. Remediate by deploying the fix from PR #45116, which adds a math.isfinite() check, or immediately reject non-finite float values at the API gateway before requests reach vLLM.
What is the risk?
Medium operational risk with high exploitability relative to skill required. The trigger is trivial — any API caller can set temperature to NaN or +Infinity — and requires no authentication bypass, credential theft, or advanced technique. Impact is bounded to denial of service (full inference worker crash affecting all concurrent users), with no data exfiltration or persistent system compromise. The 88th EPSS percentile relative to a medium CVSS indicates the research community views this as straightforward to weaponize. Risk is most acute in multi-tenant or publicly accessible vLLM deployments where a single unauthenticated request can disrupt shared serving capacity.
How does the attack unfold?
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| vLLM | pip | <= 0.23.0 | No patch |
Do you use vLLM? You're affected.
How severe is it?
What should I do?
4 steps-
Patch: apply the fix in PR #45116 / commit d598d239737, which adds math.isfinite(self.temperature) validation to _verify_args() and returns HTTP 400 for non-finite values.
-
Gateway workaround (pre-patch): add input validation at the API ingestion layer — frameworks such as FastAPI with Pydantic or nginx with lua can reject temperature values that fail isfinite() checks before the request reaches vLLM.
-
Detection: monitor inference worker process restarts and CUDA error logs for unusual crash spikes; a burst of CUDA softmax errors or rapid worker recycling is the primary telemetry signal for exploitation attempts.
-
Prioritize exposure: audit all public or untrusted-user-accessible vLLM endpoints, gate those first, and treat internal multi-tenant deployments as equally urgent given the shared-worker impact.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2026-54235?
A validation bypass in vLLM's temperature parameter handling allows any caller with API access to crash GPU inference workers by passing NaN or positive Infinity as the temperature value — Python's IEEE 754 float semantics cause all comparison guards in sampling_params.py to silently evaluate to False, letting the invalid value reach CUDA kernels where it triggers undefined behavior. With 130 downstream dependents and an EPSS score placing this vulnerability in the 88th percentile for exploitation likelihood, any vLLM-based serving infrastructure — including self-hosted LLM APIs, multi-tenant inference platforms, and derivative middleware — faces service disruption from a single malformed request. No public exploit or KEV listing exists yet, but the attack requires no special knowledge beyond knowing the parameter name and is effective against all versions up to 0.23.0. Remediate by deploying the fix from PR #45116, which adds a math.isfinite() check, or immediately reject non-finite float values at the API gateway before requests reach vLLM.
Is CVE-2026-54235 actively exploited?
No confirmed active exploitation of CVE-2026-54235 has been reported, but organizations should still patch proactively.
How to fix CVE-2026-54235?
1. Patch: apply the fix in PR #45116 / commit d598d239737, which adds math.isfinite(self.temperature) validation to _verify_args() and returns HTTP 400 for non-finite values. 2. Gateway workaround (pre-patch): add input validation at the API ingestion layer — frameworks such as FastAPI with Pydantic or nginx with lua can reject temperature values that fail isfinite() checks before the request reaches vLLM. 3. Detection: monitor inference worker process restarts and CUDA error logs for unusual crash spikes; a burst of CUDA softmax errors or rapid worker recycling is the primary telemetry signal for exploitation attempts. 4. Prioritize exposure: audit all public or untrusted-user-accessible vLLM endpoints, gate those first, and treat internal multi-tenant deployments as equally urgent given the shared-worker impact.
What systems are affected by CVE-2026-54235?
This vulnerability affects the following AI/ML architecture patterns: model serving, LLM inference APIs, multi-tenant inference platforms, AI gateway deployments, self-hosted LLM deployments.
What is the CVSS score for CVE-2026-54235?
No CVSS score has been assigned yet.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0029 Denial of AI Service AML.T0043.003 Manual Modification AML.T0049 Exploit Public-Facing Application Compliance Controls Affected
What are the technical details?
Original Advisory
## Summary All temperature validation gates use comparison operators (`<`, `>`), which silently evaluate to `False` for `NaN` and for positive `Infinity` in Python's IEEE 754 float semantics. Both values pass every guard and propagate to GPU sampling kernels, where they produce undefined behavior or CUDA errors that can crash the inference worker. Note: `-Infinity` is correctly caught. ## Root Cause `sampling_params.py:384`: ```python if 0 < self.temperature < _MAX_TEMP: # NaN → False; +Inf → False ``` `sampling_params.py:462`: ```python if self.temperature < 0.0: # NaN → False; +Inf → False raise VLLMValidationError(...) ``` No `math.isnan()` or `math.isinf()` check exists anywhere in `sampling_params.py`. Python semantics (verified): `float('nan') < 0.0` → `False`, `float('inf') < 0.0` → `False`. ## Impact Crash of inference worker on GPU kernel execution with NaN/Inf softmax input, degrading service for all concurrent users. ## Remediation Add `math.isfinite(self.temperature)` check in `_verify_args()`. Reject non-finite float values with a 400 error. ## Fix A fix for this vulnerability was merged here: https://github.com/vllm-project/vllm/pull/45116
Exploitation Scenario
An adversary with access to a vLLM inference API endpoint — whether via an open multi-tenant deployment, a compromised API key, or an internally exposed service — crafts a standard inference request and sets the temperature field to float('nan') or float('inf'). Because Python IEEE 754 semantics cause NaN and +Infinity to return False for every comparison, the value bypasses all guards in sampling_params.py without raising any exception. The non-finite value is forwarded to the GPU CUDA sampling kernel, which receives an invalid softmax input, produces undefined behavior or a hard CUDA error, and crashes the inference worker process. All serving capacity shared by concurrent users is immediately lost, and the disruption persists until the worker is restarted — giving the adversary a repeatable low-cost mechanism to keep the service down.
Weaknesses (CWE)
CWE-1287 — Improper Validation of Specified Type of Input: The product receives input that is expected to be of a certain type, but it does not validate or incorrectly validates that the input is actually of the expected type.
- [Implementation] Assume all input is malicious. Use an "accept known good" input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does. When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, "boat" may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as "red" or "blue." Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code's environment changes. This can give attackers enough room to bypass the intended validation. However, denylis
Source: MITRE CWE corpus.
References
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm