AI Security Research

2,560+ academic papers on AI security, attacks, and defenses

Total

2,560

Attack

982

Benchmark

736

Defense

350

Tool

275

Survey

144

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 521–540 of 727 papers

Clear filters

Attack HIGH

Knowledge-Driven Multi-Turn Jailbreaking on Large Language Models

Songze Li, Ruishi He, Xiaojun Jia +2 more

Large Language Models (LLMs) face a significant threat from multi-turn jailbreak attacks, where adversaries progressively steer conversations to...

4 months ago cs.CR cs.LG PDF

Attack HIGH

Multi-turn Jailbreaking Attack in Multi-Modal Large Language Models

Badhan Chandra Das, Md Tasnim Jawad, Joaquin Molto +2 more

In recent years, the security vulnerabilities of Multi-modal Large Language Models (MLLMs) have become a serious concern in the Generative Artificial...

4 months ago cs.CR cs.AI PDF

Attack MEDIUM

Effects of personality steering on cooperative behavior in Large Language Model agents

Mizuki Sakai, Mizuki Yokoyama, Wakaba Tateishi +1 more

Large language models (LLMs) are increasingly used as autonomous agents in strategic and social interactions. Although recent studies suggest that...

4 months ago cs.AI PDF

Attack HIGH

Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning

Zhiyuan Chang, Mingyang Li, Yuekai Huang +6 more

Large language model (LLM)-integrated applications have become increasingly prevalent, yet face critical security vulnerabilities from prompt...

4 months ago cs.AI cs.CR PDF

Attack HIGH

Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks

Hoagy Cunningham, Jerry Wei, Zihan Wang +26 more

We introduce enhanced Constitutional Classifiers that deliver production-grade jailbreak robustness with dramatically reduced computational costs and...

4 months ago cs.CR cs.AI PDF

Attack MEDIUM

Deep Dive into the Abuse of DL APIs To Create Malicious AI Models and How to Detect Them

Mohamed Nabeel, Oleksii Starov

According to Gartner, more than 70% of organizations will have integrated AI models into their workflows by the end of 2025. In order to reduce cost...

4 months ago cs.CR PDF

Attack MEDIUM

Merging Triggers, Breaking Backdoors: Defensive Poisoning for Instruction-Tuned Language Models

San Kim, Gary Geunbae Lee

Large Language Models (LLMs) have greatly advanced Natural Language Processing (NLP), particularly through instruction tuning, which enables broad...

4 months ago cs.CL cs.AI PDF

Attack HIGH

Large Language Models for Detecting Cyberattacks on Smart Grid Protective Relays

Ahmad Mohammad Saber, Saeed Jafari, Zhengmao Ouyang +3 more

This paper presents a large language model (LLM)-based framework that adapts and fine-tunes compact LLMs for detecting cyberattacks on transformer...

4 months ago cs.CR cs.LG eess.SP PDF

Attack HIGH

MiJaBench: Revealing Minority Biases in Large Language Models via Hate Speech Jailbreaking

Iago Alves Brito, Walcy Santos Rezende Rios, Julia Soares Dollis +2 more

Current safety evaluations of large language models (LLMs) create a dangerous illusion of universality, aggregating "Identity Hate" into scalar...

4 months ago cs.CL cs.AI PDF

Attack HIGH

SearchAttack: Red-Teaming LLMs against Knowledge-to-Action Threats under Online Web Search

Yu Yan, Sheng Sun, Mingfeng Li +6 more

Recently, people have suffered from LLM hallucination and have become increasingly aware of the reliability gap of LLMs in open and...

4 months ago cs.CL PDF

Attack HIGH

HoneyTrap: Deceiving Large Language Model Attackers to Honeypot Traps with Resilient Multi-Agent Defense

Siyuan Li, Xi Lin, Jun Wu +5 more

Jailbreak attacks pose significant threats to large language models (LLMs), enabling attackers to bypass safeguards. However, existing reactive...

4 months ago cs.CR cs.AI PDF

Attack HIGH

State Backdoor: Towards Stealthy Real-world Poisoning Attack on Vision-Language-Action Model in State Space

Ji Guo, Wenbo Jiang, Yansong Lin +7 more

Vision-Language-Action (VLA) models are widely deployed in safety-critical embodied AI applications such as robotics. However, their complex...

4 months ago cs.CR cs.LG PDF

Attack HIGH

Inhibitory Attacks on Backdoor-based Fingerprinting for Large Language Models

Hang Fu, Wanli Peng, Yinghan Zhou +3 more

The widespread adoption of Large Language Model (LLM) in commercial and research settings has intensified the need for robust intellectual property...

4 months ago cs.CR PDF

Attack HIGH

Analyzing Reasoning Shifts in Audio Deepfake Detection under Adversarial Attacks: The Reasoning Tax versus Shield Bifurcation

Binh Nguyen, Thai Le

Audio Language Models (ALMs) offer a promising shift towards explainable audio deepfake detections (ADDs), moving beyond \textit{black-box}...

4 months ago cs.CL cs.SD eess.AS PDF

Attack HIGH

ALERT: Zero-shot LLM Jailbreak Detection via Internal Discrepancy Amplification

Xiao Lin, Philip Li, Zhichen Zeng +6 more

Despite rich safety alignment strategies, large language models (LLMs) remain highly susceptible to jailbreak attacks, which compromise safety...

4 months ago cs.LG cs.AI cs.IR PDF

Attack HIGH

Jailbreaking LLMs Without Gradients or Priors: Effective and Transferable Attacks

Zhakshylyk Nurlanov, Frank R. Schmidt, Florian Bernard

As Large Language Models (LLMs) are increasingly deployed in safety-critical domains, rigorously evaluating their robustness against adversarial...

4 months ago cs.LG cs.AI cs.CR PDF

Attack MEDIUM

Enhancing Moral Diagnosis and Correction in Large Language Models

Bocheng Chen, Xi Chen, Han Zi +5 more

Identifying specific moral errors in an input and generating appropriate corrections require moral sensitivity in large language models (LLMs), which...

4 months ago cs.CL PDF

Attack HIGH

JPU: Bridging Jailbreak Defense and Unlearning via On-Policy Path Rectification

Xi Wang, Songlei Jian, Shasha Li +5 more

Despite extensive safety alignment, Large Language Models (LLMs) often fail against jailbreak attacks. While machine unlearning has emerged as a...

4 months ago cs.CR cs.AI PDF

Attack HIGH

Window-based Membership Inference Attacks Against Fine-tuned Large Language Models

Yuetian Chen, Yuntao Du, Kaiyuan Zhang +4 more

Most membership inference attacks (MIAs) against Large Language Models (LLMs) rely on global signals, like average loss, to identify training data....

4 months ago cs.CL cs.AI cs.CR PDF

Attack HIGH

Adversarial Contrastive Learning for LLM Quantization Attacks

Dinghong Song, Zhiwei Xu, Hai Wan +3 more

Model quantization is critical for deploying large language models (LLMs) on resource-constrained hardware, yet recent work has revealed severe...

4 months ago cs.CR cs.LG PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial