Tool HIGH relevance

HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models

Sidhant Narula Javad Rafiei Asl Mohammad Ghasemigol Eduardo Blanco Daniel Takabi

cs.CR cs.AI

Published

October 21, 2025

Updated

October 21, 2025

Links

PDF arxiv

Abstract

Large Language Models (LLMs) remain vulnerable to multi-turn jailbreak attacks. We introduce HarmNet, a modular framework comprising ThoughtNet, a hierarchical semantic network; a feedback-driven Simulator for iterative query refinement; and a Network Traverser for real-time adaptive attack execution. HarmNet systematically explores and refines the adversarial space to uncover stealthy, high-success attack paths. Experiments across closed-source and open-source LLMs show that HarmNet outperforms state-of-the-art methods, achieving higher attack success rates. For example, on Mistral-7B, HarmNet achieves a 99.4% attack success rate, 13.9% higher than the best baseline. Index terms: jailbreak attacks; large language models; adversarial framework; query refinement.

Metadata

Comment: This paper has been accepted for presentation at the Conference on Applied Machine Learning in Information Security (CAMLIS 2025)

Pro Analysis

Full threat analysis, ATLAS technique mapping, compliance impact assessment (ISO 42001, EU AI Act), and actionable recommendations are available with a Pro subscription.

Threat Deep-Dive

ATLAS Mapping

Compliance Reports

Actionable Recommendations

Start 14-Day Free Trial

Back to Research