CVE-2026-4944: vllm: trust_remote_code bypass enables RCE via HuggingFace
AWAITING NVDvllm 0.14.1 hardcodes `trust_remote_code=True` in the NemotronVL and KimiK25 model implementation files, silently overriding an operator's explicit `--trust-remote-code=False` flag — meaning teams who believe they disabled remote code execution are not protected. This is an incomplete fix for two prior CVEs (CVE-2025-66448 and CVE-2026-22807), indicating a systemic pattern in the codebase where security settings are not propagated consistently through model-specific code paths. An attacker only needs to publish a malicious NemotronVL- or KimiK25-compatible repository on HuggingFace to achieve arbitrary code execution on any vllm inference server loading those architectures. Organizations running vllm 0.14.1 should immediately audit which model architectures are loaded in production, block NemotronVL and KimiK25 loading from untrusted registries as an interim control, and track the upstream patch.
What is the risk?
HIGH despite absent CVSS scores. The core risk multiplier is the false security guarantee: operators who explicitly set `--trust-remote-code=False` believe they are protected and will not apply additional compensating controls. The attack path is low-complexity — host a malicious HuggingFace repository and wait for a vulnerable vllm instance to load it. The incomplete-fix pattern across three CVEs (CVE-2025-66448, CVE-2026-22807, this one) suggests additional undiscovered code paths may carry the same flaw. Inference servers typically run with elevated privileges and network access, amplifying post-exploitation blast radius to the underlying host and connected systems.
Attack Kill Chain
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| vllm | pip | — | No patch |
Do you use vllm? You're affected.
Severity & Risk
What should I do?
6 steps-
Immediately audit whether NemotronVL or KimiK25 architectures are loaded in any vllm 0.14.1 deployment.
-
As an interim control, block loading of those model types from untrusted HuggingFace repositories by allowlisting model sources at the network or configuration level.
-
Monitor the vllm GitHub and huntr advisory (https://huntr.com/bounties/97f706f7-a852-49b2-a4eb-76811e611daf) for the patched release and prioritize upgrade.
-
Audit nemotron_vl.py and kimi_k25.py in your deployed version for the hardcoded flag as a detection step.
-
Review all other model implementation files in vllm/model_executor/models/ for the same pattern — given two prior incomplete fixes, additional affected files may exist.
-
Enforce model loading only from internal, verified registries and scan model files with tools like ModelScan before deployment.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2026-4944?
vllm 0.14.1 hardcodes `trust_remote_code=True` in the NemotronVL and KimiK25 model implementation files, silently overriding an operator's explicit `--trust-remote-code=False` flag — meaning teams who believe they disabled remote code execution are not protected. This is an incomplete fix for two prior CVEs (CVE-2025-66448 and CVE-2026-22807), indicating a systemic pattern in the codebase where security settings are not propagated consistently through model-specific code paths. An attacker only needs to publish a malicious NemotronVL- or KimiK25-compatible repository on HuggingFace to achieve arbitrary code execution on any vllm inference server loading those architectures. Organizations running vllm 0.14.1 should immediately audit which model architectures are loaded in production, block NemotronVL and KimiK25 loading from untrusted registries as an interim control, and track the upstream patch.
Is CVE-2026-4944 actively exploited?
No confirmed active exploitation of CVE-2026-4944 has been reported, but organizations should still patch proactively.
How to fix CVE-2026-4944?
1. Immediately audit whether NemotronVL or KimiK25 architectures are loaded in any vllm 0.14.1 deployment. 2. As an interim control, block loading of those model types from untrusted HuggingFace repositories by allowlisting model sources at the network or configuration level. 3. Monitor the vllm GitHub and huntr advisory (https://huntr.com/bounties/97f706f7-a852-49b2-a4eb-76811e611daf) for the patched release and prioritize upgrade. 4. Audit nemotron_vl.py and kimi_k25.py in your deployed version for the hardcoded flag as a detection step. 5. Review all other model implementation files in vllm/model_executor/models/ for the same pattern — given two prior incomplete fixes, additional affected files may exist. 6. Enforce model loading only from internal, verified registries and scan model files with tools like ModelScan before deployment.
What systems are affected by CVE-2026-4944?
This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, model serving, multi-model serving pipelines, batch inference pipelines, on-demand model loading workflows.
What is the CVSS score for CVE-2026-4944?
No CVSS score has been assigned yet.
AI Security Impact
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0010.001 AI Software AML.T0010.003 Model AML.T0011.000 Unsafe AI Artifacts AML.T0058 Publish Poisoned Models AML.T0072 Reverse Shell Compliance Controls Affected
Technical Details
Original Advisory
vllm-project/vllm version 0.14.1 contains a vulnerability where the `trust_remote_code=True` parameter is hardcoded in two model implementation files (`vllm/model_executor/models/nemotron_vl.py` and `vllm/model_executor/models/kimi_k25.py`). This bypasses the user's explicit `--trust-remote-code=False` setting, enabling remote code execution via malicious HuggingFace model repositories. This issue is an incomplete fix for CVE-2025-66448 and CVE-2026-22807, as it affects separate code paths in model implementation files. Deployments loading NemotronVL or KimiK25 models are particularly impacted.
Exploitation Scenario
An adversary registers a HuggingFace account and publishes a repository that presents as a legitimate NemotronVL or KimiK25 model — leveraging AML.T0111 reputation inflation tactics such as stars, a convincing model card, and a realistic evaluation benchmark. A target organization running vllm 0.14.1 with `--trust-remote-code=False` loads this model into their inference fleet, believing they are protected. At model initialization, the hardcoded `trust_remote_code=True` in the vllm model implementation file causes Python code embedded in the model's configuration or custom modeling files to execute on the inference server. The adversary gains a reverse shell or deploys a persistent backdoor with the privileges of the vllm process, potentially accessing GPU memory, API keys, internal network segments, and downstream data stores.
Weaknesses (CWE)
Timeline
Related Vulnerabilities
CVE-2024-9053 9.8 vllm: RCE via unsafe pickle deserialization in RPC server
Same package: vllm CVE-2024-11041 9.8 vllm: RCE via unsafe pickle deserialization in MessageQueue
Same package: vllm CVE-2026-25960 9.8 vllm: SSRF allows internal network access
Same package: vllm CVE-2025-47277 9.8 vLLM: RCE via exposed TCPStore in distributed inference
Same package: vllm CVE-2025-32444 9.8 vLLM: RCE via pickle deserialization on ZeroMQ
Same package: vllm