vllm: trust_remote_code bypass enables RCE via HuggingFace — AWAITING NVD (CVE-2026-4944)

CISO Take

vllm 0.14.1 hardcodes `trust_remote_code=True` in the NemotronVL and KimiK25 model implementation files, silently overriding an operator's explicit `--trust-remote-code=False` flag — meaning teams who believe they disabled remote code execution are not protected. This is an incomplete fix for two prior CVEs (CVE-2025-66448 and CVE-2026-22807), indicating a systemic pattern in the codebase where security settings are not propagated consistently through model-specific code paths. An attacker only needs to publish a malicious NemotronVL- or KimiK25-compatible repository on HuggingFace to achieve arbitrary code execution on any vllm inference server loading those architectures. Organizations running vllm 0.14.1 should immediately audit which model architectures are loaded in production, block NemotronVL and KimiK25 loading from untrusted registries as an interim control, and track the upstream patch.

Sources: NVD ATLAS huntr.com

What is the risk?

HIGH despite absent CVSS scores. The core risk multiplier is the false security guarantee: operators who explicitly set `--trust-remote-code=False` believe they are protected and will not apply additional compensating controls. The attack path is low-complexity — host a malicious HuggingFace repository and wait for a vulnerable vllm instance to load it. The incomplete-fix pattern across three CVEs (CVE-2025-66448, CVE-2026-22807, this one) suggests additional undiscovered code paths may carry the same flaw. Inference servers typically run with elevated privileges and network access, amplifying post-exploitation blast radius to the underlying host and connected systems.

Attack Kill Chain

Publish Poisoned Model

Adversary publishes a malicious NemotronVL- or KimiK25-compatible model repository on HuggingFace containing executable Python code in model configuration or custom modeling files.

AML.T0058

Model Load Trigger

Victim's vllm 0.14.1 instance loads the malicious model; despite `--trust-remote-code=False` being set, the hardcoded `trust_remote_code=True` in the model implementation file overrides the operator's security setting at initialization time.

AML.T0010.003

Remote Code Execution

Malicious Python code embedded in the model executes on the inference server with the privileges of the vllm process before any inference request is processed.

AML.T0011.000

Host Compromise

Adversary establishes persistent access (reverse shell, backdoor) to the inference server, gaining access to GPU memory, API credentials, internal network segments, and downstream data stores.

AML.T0072

Publish Poisoned Model

Adversary publishes a malicious NemotronVL- or KimiK25-compatible model repository on HuggingFace containing executable Python code in model configuration or custom modeling files.

AML.T0058

Model Load Trigger

Victim's vllm 0.14.1 instance loads the malicious model; despite `--trust-remote-code=False` being set, the hardcoded `trust_remote_code=True` in the model implementation file overrides the operator's security setting at initialization time.

AML.T0010.003

Remote Code Execution

Malicious Python code embedded in the model executes on the inference server with the privileges of the vllm process before any inference request is processed.

AML.T0011.000

Host Compromise

Adversary establishes persistent access (reverse shell, backdoor) to the inference server, gaining access to GPU memory, API credentials, internal network segments, and downstream data stores.

AML.T0072

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
vllm	pip	—	No patch
80.8K 127 dependents Pushed 4d ago 54% patched ~33d to patch Full package profile →

Do you use vllm? You're affected.

Severity & Risk

CVSS 3.1

N/A

EPSS

N/A

Exploitation Status

No known exploitation

Sophistication

Moderate

What should I do?

6 steps

Immediately audit whether NemotronVL or KimiK25 architectures are loaded in any vllm 0.14.1 deployment.
As an interim control, block loading of those model types from untrusted HuggingFace repositories by allowlisting model sources at the network or configuration level.
Monitor the vllm GitHub and huntr advisory (https://huntr.com/bounties/97f706f7-a852-49b2-a4eb-76811e611daf) for the patched release and prioritize upgrade.
Audit nemotron_vl.py and kimi_k25.py in your deployed version for the hardcoded flag as a detection step.
Review all other model implementation files in vllm/model_executor/models/ for the same pattern — given two prior incomplete fixes, additional affected files may exist.
Enforce model loading only from internal, verified registries and scan model files with tools like ModelScan before deployment.

Classification

Supply Chain Code Execution Inference Model AML.T0010.001 - AI Software AML.T0010.003 - Model AML.T0011.000 - Unsafe AI Artifacts AML.T0058 - Publish Poisoned Models AML.T0072 - Reverse Shell

Compliance Impact

This CVE is relevant to:

EU AI Act

Article 9 - Risk management system

ISO 42001

A.6.2.3 - AI system supply chain security

NIST AI RMF

MANAGE-2.2 - Mechanisms to sustain oversight of AI systems

OWASP LLM Top 10

LLM03 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2026-4944?

vllm 0.14.1 hardcodes `trust_remote_code=True` in the NemotronVL and KimiK25 model implementation files, silently overriding an operator's explicit `--trust-remote-code=False` flag — meaning teams who believe they disabled remote code execution are not protected. This is an incomplete fix for two prior CVEs (CVE-2025-66448 and CVE-2026-22807), indicating a systemic pattern in the codebase where security settings are not propagated consistently through model-specific code paths. An attacker only needs to publish a malicious NemotronVL- or KimiK25-compatible repository on HuggingFace to achieve arbitrary code execution on any vllm inference server loading those architectures. Organizations running vllm 0.14.1 should immediately audit which model architectures are loaded in production, block NemotronVL and KimiK25 loading from untrusted registries as an interim control, and track the upstream patch.

Is CVE-2026-4944 actively exploited?

No confirmed active exploitation of CVE-2026-4944 has been reported, but organizations should still patch proactively.

How to fix CVE-2026-4944?

1. Immediately audit whether NemotronVL or KimiK25 architectures are loaded in any vllm 0.14.1 deployment. 2. As an interim control, block loading of those model types from untrusted HuggingFace repositories by allowlisting model sources at the network or configuration level. 3. Monitor the vllm GitHub and huntr advisory (https://huntr.com/bounties/97f706f7-a852-49b2-a4eb-76811e611daf) for the patched release and prioritize upgrade. 4. Audit nemotron_vl.py and kimi_k25.py in your deployed version for the hardcoded flag as a detection step. 5. Review all other model implementation files in vllm/model_executor/models/ for the same pattern — given two prior incomplete fixes, additional affected files may exist. 6. Enforce model loading only from internal, verified registries and scan model files with tools like ModelScan before deployment.

What systems are affected by CVE-2026-4944?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, model serving, multi-model serving pipelines, batch inference pipelines, on-demand model loading workflows.

What is the CVSS score for CVE-2026-4944?

No CVSS score has been assigned yet.

AI Security Impact

Affected AI Architectures

LLM inference servingmodel servingmulti-model serving pipelinesbatch inference pipelineson-demand model loading workflows

MITRE ATLAS Techniques

AML.T0010.001 AI Software

AML.T0010.003 Model

AML.T0011.000 Unsafe AI Artifacts

AML.T0058 Publish Poisoned Models

AML.T0072 Reverse Shell

Compliance Controls Affected

EU AI Act: Article 9

ISO 42001: A.6.2.3

NIST AI RMF: MANAGE-2.2

OWASP LLM Top 10: LLM03

Technical Details

Original Advisory

vllm-project/vllm version 0.14.1 contains a vulnerability where the `trust_remote_code=True` parameter is hardcoded in two model implementation files (`vllm/model_executor/models/nemotron_vl.py` and `vllm/model_executor/models/kimi_k25.py`). This bypasses the user's explicit `--trust-remote-code=False` setting, enabling remote code execution via malicious HuggingFace model repositories. This issue is an incomplete fix for CVE-2025-66448 and CVE-2026-22807, as it affects separate code paths in model implementation files. Deployments loading NemotronVL or KimiK25 models are particularly impacted.

Exploitation Scenario

An adversary registers a HuggingFace account and publishes a repository that presents as a legitimate NemotronVL or KimiK25 model — leveraging AML.T0111 reputation inflation tactics such as stars, a convincing model card, and a realistic evaluation benchmark. A target organization running vllm 0.14.1 with `--trust-remote-code=False` loads this model into their inference fleet, believing they are protected. At model initialization, the hardcoded `trust_remote_code=True` in the vllm model implementation file causes Python code embedded in the model's configuration or custom modeling files to execute on the inference server. The adversary gains a reverse shell or deploys a persistent backdoor with the privileges of the vllm process, potentially accessing GPU memory, API keys, internal network segments, and downstream data stores.