CVE-2025-66448: vllm: Code Injection enables RCE

GHSA-8fr4-5q9j-m8gm HIGH CISA: TRACK*
Published December 1, 2025
CISO Take

vLLM's trust_remote_code=False flag is completely ineffective in versions prior to 0.11.1 — attackers can publish a benign-looking model on any public hub (e.g., Hugging Face) that silently executes arbitrary Python on your inference server at load time. If you run vLLM in production, patch to 0.11.1 immediately and audit every model source your pipelines pull from. Until patched, treat every external model load as a potential RCE vector regardless of your trust settings.

What is the risk?

HIGH. CVSS 8.8 with network-accessible attack path and no privilege requirements makes this broadly exploitable. The critical aggravating factor is the security control bypass: organizations that explicitly set trust_remote_code=False believe they are protected when they are not. vLLM is widely deployed in enterprise LLM serving infrastructure, meaning blast radius is large. EPSS is low (0.00205) at time of publication but exploitability is straightforward once the vulnerability is understood — a malicious model repo is the only infrastructure needed. Not in CISA KEV yet but supply-chain RCE in AI inference engines warrants proactive response.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
vLLM pip No patch
83.4K 130 dependents Pushed 3d ago 34% patched ~32d to patch Full package profile →
vLLM pip < 0.11.1 0.11.1
83.4K 130 dependents Pushed 3d ago 34% patched ~32d to patch Full package profile →

How severe is it?

CVSS 3.1
8.8 / 10
EPSS
0.6%
chance of exploitation in 30 days
Higher than 43% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Moderate

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI Required
S Unchanged
C High
I High
A High

What should I do?

7 steps
  1. PATCH

    Upgrade vLLM to >= 0.11.1 immediately (pip install --upgrade vllm).

  2. INTERIM WORKAROUND

    Until patched, restrict model sources to an internal registry or a vetted allowlist — do not load arbitrary community models.

  3. AUDIT

    Review all model sources currently in use; check config.json files for auto_map entries pointing to external repositories.

  4. DO NOT TRUST THE FLAG

    Explicitly passing trust_remote_code=False is NOT a compensating control in affected versions — remove it from your runbooks as a false safety net.

  5. SANDBOX

    Run model loading in isolated containers with no network egress to reduce blast radius.

  6. DETECT

    Monitor for unexpected outbound connections from vLLM processes during model initialization; alert on connections to github.com/huggingface.co from inference hosts that are not part of approved model pull workflows.

  7. VERIFY

    After patching, confirm your vllm version with pip show vllm.

What does CISA's SSVC say?

Decision Track*
Exploitation none
Automatable Yes
Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Art.25 - Obligations of Deployers Art.9 - Risk Management System Article 13 - Transparency and provision of information Article 9 - Risk management system
ISO 42001
A.6.1.2 - AI Supply Chain Management A.6.1.5 - Information security in project management A.8.1.1 - Inventory of assets and supplier management A.9.2 - AI Risk Treatment
NIST AI RMF
GOVERN 6.1 - AI Supply Chain Risk Management GOVERN-6.1 - Policies and practices for AI supply chain risk MANAGE 2.2 - Mechanisms for Managing AI Risks MAP-5.2 - AI system risks propagated to downstream users
OWASP LLM Top 10
LLM03 - Supply Chain Vulnerabilities LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2025-66448?

vLLM's trust_remote_code=False flag is completely ineffective in versions prior to 0.11.1 — attackers can publish a benign-looking model on any public hub (e.g., Hugging Face) that silently executes arbitrary Python on your inference server at load time. If you run vLLM in production, patch to 0.11.1 immediately and audit every model source your pipelines pull from. Until patched, treat every external model load as a potential RCE vector regardless of your trust settings.

Is CVE-2025-66448 actively exploited?

No confirmed active exploitation of CVE-2025-66448 has been reported, but organizations should still patch proactively.

How to fix CVE-2025-66448?

1. PATCH: Upgrade vLLM to >= 0.11.1 immediately (pip install --upgrade vllm). 2. INTERIM WORKAROUND: Until patched, restrict model sources to an internal registry or a vetted allowlist — do not load arbitrary community models. 3. AUDIT: Review all model sources currently in use; check config.json files for auto_map entries pointing to external repositories. 4. DO NOT TRUST THE FLAG: Explicitly passing trust_remote_code=False is NOT a compensating control in affected versions — remove it from your runbooks as a false safety net. 5. SANDBOX: Run model loading in isolated containers with no network egress to reduce blast radius. 6. DETECT: Monitor for unexpected outbound connections from vLLM processes during model initialization; alert on connections to github.com/huggingface.co from inference hosts that are not part of approved model pull workflows. 7. VERIFY: After patching, confirm your vllm version with pip show vllm.

What systems are affected by CVE-2025-66448?

This vulnerability affects the following AI/ML architecture patterns: LLM inference servers, model serving, AI/ML deployment pipelines, model hub integrations, MLOps CI/CD pipelines.

What is the CVSS score for CVE-2025-66448?

CVE-2025-66448 has a CVSS v3.1 base score of 8.8 (HIGH). The EPSS exploitation probability is 0.57%.

What is the AI security impact?

Affected AI Architectures

LLM inference serversmodel servingAI/ML deployment pipelinesmodel hub integrationsMLOps CI/CD pipelines

MITRE ATLAS Techniques

AML.T0002.001 Models
AML.T0010.001 AI Software
AML.T0010.003 Model
AML.T0011.000 Unsafe AI Artifacts
AML.T0021 Establish Accounts
AML.T0058 Publish Poisoned Models
AML.T0072 Reverse Shell
AML.T0074 Masquerading

Compliance Controls Affected

EU AI Act: Art.25, Art.9, Article 13, Article 9
ISO 42001: A.6.1.2, A.6.1.5, A.8.1.1, A.9.2
NIST AI RMF: GOVERN 6.1, GOVERN-6.1, MANAGE 2.2, MAP-5.2
OWASP LLM Top 10: LLM03, LLM05

What are the technical details?

Original Advisory

vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.11.1, vllm has a critical remote code execution vector in a config class named Nemotron_Nano_VL_Config. When vllm loads a model config that contains an auto_map entry, the config class resolves that mapping with get_class_from_dynamic_module(...) and immediately instantiates the returned class. This fetches and executes Python from the remote repository referenced in the auto_map string. Crucially, this happens even when the caller explicitly sets trust_remote_code=False in vllm.transformers_utils.config.get_config. In practice, an attacker can publish a benign-looking frontend repo whose config.json points via auto_map to a separate malicious backend repo; loading the frontend will silently run the backend’s code on the victim host. This vulnerability is fixed in 0.11.1.

Exploitation Scenario

An adversary registers two GitHub/Hugging Face accounts. The first hosts a legitimate-looking multimodal model repository (frontend repo) with a well-crafted README, model card, and config.json. The config.json includes an auto_map field pointing to the adversary's second repository (backend repo) which hosts a malicious Python class. The frontend repo is promoted in AI/ML communities, referenced in blog posts, or submitted to model leaderboards to build credibility. A target organization's MLOps pipeline or developer runs vllm.LLM('attacker/benign-model', trust_remote_code=False) — the False flag is ignored, vLLM resolves get_class_from_dynamic_module against the auto_map URL, fetches the malicious Python from the backend repo, and executes it on the inference host. The payload can drop a reverse shell, exfiltrate environment variables (AWS credentials, OpenAI keys, internal API tokens), or install a persistent backdoor. The entire compromise happens silently before any inference request is processed.

Weaknesses (CWE)

CWE-94 — Improper Control of Generation of Code ('Code Injection'): The product constructs all or part of a code segment using externally-influenced input from an upstream component, but it does not neutralize or incorrectly neutralizes special elements that could modify the syntax or behavior of the intended code segment.

  • [Architecture and Design] Refactor your program so that you do not have to dynamically generate code.
  • [Architecture and Design] Run your code in a "jail" or similar sandbox environment that enforces strict boundaries between the process and the operating system. This may effectively restrict which code can be executed by your product. Examples include the Unix chroot jail and AppArmor. In general, managed code may provide some protection. This may not be a feasible solution, and it only limits the impact to the operating system; the rest of your application may still be subject to compromise. Be careful to avoid CWE-243 and other weaknesses related to jails.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

Timeline

Published
December 1, 2025
Last Modified
December 3, 2025
First Seen
December 1, 2025

Related Vulnerabilities