Attack HIGH relevance

PolyJailbreak: Cross-Modal Jailbreaking Attacks on Black-Box Multimodal LLMs

Xinkai Wang Beibei Li Zerui Shao Ao Liu Guangquan Xu Shouling Ji

cs.CR

Published

October 20, 2025

Updated

March 7, 2026

Links

PDF arxiv

Abstract

Multimodal large language models (MLLMs) have become integral to a wide range of real-world applications by jointly reasoning over text and visual inputs. However, despite recent advances in safety alignment, MLLMs remain vulnerable to jailbreak attacks, where carefully crafted inputs can bypass safety mechanisms and elicit harmful responses. In this work, we investigate the security vulnerabilities of MLLMs in text-vision scenarios and propose a novel black-box jailbreak framework, named PolyJailbreak. We first identify a phenomenon, termed multimodal safety asymmetry, where visual alignment introduces uneven safety constraints across modalities and weakens overall robustness. We analyze attention dynamics and latent representations in MLLMs, revealing that visual inputs can disrupt cross-modal information flow and reduce the model's ability to separate benign and malicious intents. Motivated by these findings, we propose PolyJailbreak, which organizes the discovered vulnerabilities into a structured library of reusable Atomic Strategy Primitives to enable step-wise transformations from harmful intents to effective jailbreak inputs. Guided by these primitives, a reinforcement learning-based multi-agent optimization process automatically adapts attacks to the target model without access to internal parameters. Extensive experiments on a wide range of MLLMs demonstrate that PolyJailbreak consistently outperforms state-of-the-art jailbreak baselines, with an average improvement of 18.15% in attack success rate and a success rate exceeding 95% on commercial black-box models, including GPT-4o and Gemini.

Pro Analysis

Full threat analysis, ATLAS technique mapping, compliance impact assessment (ISO 42001, EU AI Act), and actionable recommendations are available with a Pro subscription.

Threat Deep-Dive

ATLAS Mapping

Compliance Reports

Actionable Recommendations

Start 14-Day Free Trial

Back to Research