CVE-2026-4944: vllm: trust_remote_code bypass enables RCE via HuggingFace

AWAITING NVD
Published May 28, 2026
CISO Take

vllm 0.14.1 hardcodes `trust_remote_code=True` in the NemotronVL and KimiK25 model implementation files, silently overriding an operator's explicit `--trust-remote-code=False` flag — meaning teams who believe they disabled remote code execution are not protected. This is an incomplete fix for two prior CVEs (CVE-2025-66448 and CVE-2026-22807), indicating a systemic pattern in the codebase where security settings are not propagated consistently through model-specific code paths. An attacker only needs to publish a malicious NemotronVL- or KimiK25-compatible repository on HuggingFace to achieve arbitrary code execution on any vllm inference server loading those architectures. Organizations running vllm 0.14.1 should immediately audit which model architectures are loaded in production, block NemotronVL and KimiK25 loading from untrusted registries as an interim control, and track the upstream patch.

Sources: NVD ATLAS huntr.com

What is the risk?

HIGH despite absent CVSS scores. The core risk multiplier is the false security guarantee: operators who explicitly set `--trust-remote-code=False` believe they are protected and will not apply additional compensating controls. The attack path is low-complexity — host a malicious HuggingFace repository and wait for a vulnerable vllm instance to load it. The incomplete-fix pattern across three CVEs (CVE-2025-66448, CVE-2026-22807, this one) suggests additional undiscovered code paths may carry the same flaw. Inference servers typically run with elevated privileges and network access, amplifying post-exploitation blast radius to the underlying host and connected systems.

Attack Kill Chain

Publish Poisoned Model
Adversary publishes a malicious NemotronVL- or KimiK25-compatible model repository on HuggingFace containing executable Python code in model configuration or custom modeling files.
AML.T0058
Model Load Trigger
Victim's vllm 0.14.1 instance loads the malicious model; despite `--trust-remote-code=False` being set, the hardcoded `trust_remote_code=True` in the model implementation file overrides the operator's security setting at initialization time.
AML.T0010.003
Remote Code Execution
Malicious Python code embedded in the model executes on the inference server with the privileges of the vllm process before any inference request is processed.
AML.T0011.000
Host Compromise
Adversary establishes persistent access (reverse shell, backdoor) to the inference server, gaining access to GPU memory, API credentials, internal network segments, and downstream data stores.
AML.T0072

What systems are affected?

Package Ecosystem Vulnerable Range Patched
vllm pip No patch
80.8K 127 dependents Pushed 4d ago 54% patched ~33d to patch Full package profile →

Do you use vllm? You're affected.

Severity & Risk

CVSS 3.1
N/A
EPSS
N/A
Exploitation Status
No known exploitation
Sophistication
Moderate

What should I do?

6 steps
  1. Immediately audit whether NemotronVL or KimiK25 architectures are loaded in any vllm 0.14.1 deployment.

  2. As an interim control, block loading of those model types from untrusted HuggingFace repositories by allowlisting model sources at the network or configuration level.

  3. Monitor the vllm GitHub and huntr advisory (https://huntr.com/bounties/97f706f7-a852-49b2-a4eb-76811e611daf) for the patched release and prioritize upgrade.

  4. Audit nemotron_vl.py and kimi_k25.py in your deployed version for the hardcoded flag as a detection step.

  5. Review all other model implementation files in vllm/model_executor/models/ for the same pattern — given two prior incomplete fixes, additional affected files may exist.

  6. Enforce model loading only from internal, verified registries and scan model files with tools like ModelScan before deployment.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 9 - Risk management system
ISO 42001
A.6.2.3 - AI system supply chain security
NIST AI RMF
MANAGE-2.2 - Mechanisms to sustain oversight of AI systems
OWASP LLM Top 10
LLM03 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2026-4944?

vllm 0.14.1 hardcodes `trust_remote_code=True` in the NemotronVL and KimiK25 model implementation files, silently overriding an operator's explicit `--trust-remote-code=False` flag — meaning teams who believe they disabled remote code execution are not protected. This is an incomplete fix for two prior CVEs (CVE-2025-66448 and CVE-2026-22807), indicating a systemic pattern in the codebase where security settings are not propagated consistently through model-specific code paths. An attacker only needs to publish a malicious NemotronVL- or KimiK25-compatible repository on HuggingFace to achieve arbitrary code execution on any vllm inference server loading those architectures. Organizations running vllm 0.14.1 should immediately audit which model architectures are loaded in production, block NemotronVL and KimiK25 loading from untrusted registries as an interim control, and track the upstream patch.

Is CVE-2026-4944 actively exploited?

No confirmed active exploitation of CVE-2026-4944 has been reported, but organizations should still patch proactively.

How to fix CVE-2026-4944?

1. Immediately audit whether NemotronVL or KimiK25 architectures are loaded in any vllm 0.14.1 deployment. 2. As an interim control, block loading of those model types from untrusted HuggingFace repositories by allowlisting model sources at the network or configuration level. 3. Monitor the vllm GitHub and huntr advisory (https://huntr.com/bounties/97f706f7-a852-49b2-a4eb-76811e611daf) for the patched release and prioritize upgrade. 4. Audit nemotron_vl.py and kimi_k25.py in your deployed version for the hardcoded flag as a detection step. 5. Review all other model implementation files in vllm/model_executor/models/ for the same pattern — given two prior incomplete fixes, additional affected files may exist. 6. Enforce model loading only from internal, verified registries and scan model files with tools like ModelScan before deployment.

What systems are affected by CVE-2026-4944?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, model serving, multi-model serving pipelines, batch inference pipelines, on-demand model loading workflows.

What is the CVSS score for CVE-2026-4944?

No CVSS score has been assigned yet.

AI Security Impact

Affected AI Architectures

LLM inference servingmodel servingmulti-model serving pipelinesbatch inference pipelineson-demand model loading workflows

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0010.003 Model
AML.T0011.000 Unsafe AI Artifacts
AML.T0058 Publish Poisoned Models
AML.T0072 Reverse Shell

Compliance Controls Affected

EU AI Act: Article 9
ISO 42001: A.6.2.3
NIST AI RMF: MANAGE-2.2
OWASP LLM Top 10: LLM03

Technical Details

Original Advisory

vllm-project/vllm version 0.14.1 contains a vulnerability where the `trust_remote_code=True` parameter is hardcoded in two model implementation files (`vllm/model_executor/models/nemotron_vl.py` and `vllm/model_executor/models/kimi_k25.py`). This bypasses the user's explicit `--trust-remote-code=False` setting, enabling remote code execution via malicious HuggingFace model repositories. This issue is an incomplete fix for CVE-2025-66448 and CVE-2026-22807, as it affects separate code paths in model implementation files. Deployments loading NemotronVL or KimiK25 models are particularly impacted.

Exploitation Scenario

An adversary registers a HuggingFace account and publishes a repository that presents as a legitimate NemotronVL or KimiK25 model — leveraging AML.T0111 reputation inflation tactics such as stars, a convincing model card, and a realistic evaluation benchmark. A target organization running vllm 0.14.1 with `--trust-remote-code=False` loads this model into their inference fleet, believing they are protected. At model initialization, the hardcoded `trust_remote_code=True` in the vllm model implementation file causes Python code embedded in the model's configuration or custom modeling files to execute on the inference server. The adversary gains a reverse shell or deploys a persistent backdoor with the privileges of the vllm process, potentially accessing GPU memory, API keys, internal network segments, and downstream data stores.

Weaknesses (CWE)

Timeline

Published
May 28, 2026
Last Modified
May 28, 2026
First Seen
May 28, 2026

Related Vulnerabilities