AI Security Research

2,583+ academic papers on AI security, attacks, and defenses

Total

2,583

Attack

994

Benchmark

740

Defense

355

Tool

275

Survey

146

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 501–520 of 657 papers

Clear filters

Benchmark HIGH

The Anatomy of Conversational Scams: A Topic-Based Red Teaming Analysis of Multi-Turn Interactions in LLMs

Xiangzhe Yuan, Zhenhao Zhang, Haoming Tang +1 more

As LLMs gain persuasive agentic capabilities through extended dialogues, they introduce novel risks in multi-turn conversational scams that...

4 months ago cs.CL PDF

Attack HIGH

JPU: Bridging Jailbreak Defense and Unlearning via On-Policy Path Rectification

Xi Wang, Songlei Jian, Shasha Li +5 more

Despite extensive safety alignment, Large Language Models (LLMs) often fail against jailbreak attacks. While machine unlearning has emerged as a...

4 months ago cs.CR cs.AI PDF

Attack HIGH

Window-based Membership Inference Attacks Against Fine-tuned Large Language Models

Yuetian Chen, Yuntao Du, Kaiyuan Zhang +4 more

Most membership inference attacks (MIAs) against Large Language Models (LLMs) rely on global signals, like average loss, to identify training data....

4 months ago cs.CL cs.AI cs.CR PDF

Attack HIGH

Adversarial Contrastive Learning for LLM Quantization Attacks

Dinghong Song, Zhiwei Xu, Hai Wan +3 more

Model quantization is critical for deploying large language models (LLMs) on resource-constrained hardware, yet recent work has revealed severe...

4 months ago cs.CR cs.LG PDF

Attack HIGH

TRYLOCK: Defense-in-Depth Against LLM Jailbreaks via Layered Preference and Representation Engineering

Scott Thornton

Large language models remain vulnerable to jailbreak attacks, and single-layer defenses often trade security for usability. We present TRYLOCK, the...

4 months ago cs.CR cs.LG PDF

Attack HIGH

Multi-Turn Jailbreaking of Aligned LLMs via Lexical Anchor Tree Search

Devang Kulshreshtha, Hang Su, Chinmay Hegde +1 more

Most jailbreak methods achieve high attack success rates (ASR) but require attacker LLMs to craft adversarial queries and/or demand high query...

4 months ago cs.CL PDF

Attack HIGH

Hidden State Poisoning Attacks against Mamba-based Language Models

Alexandre Le Mercier, Chris Develder, Thomas Demeester

State space models (SSMs) like Mamba offer efficient alternatives to Transformer-based language models, with linear time complexity. Yet, their...

4 months ago cs.CL PDF

Attack HIGH

Hidden State Poisoning Attacks against Mamba-based Language Models

Alexandre Le Mercier, Chris Develder, Thomas Demeester

State space models (SSMs) like Mamba offer efficient alternatives to Transformer-based language models, with linear time complexity. Yet, their...

4 months ago cs.CL PDF

Defense HIGH

Focus on What Matters: Fisher-Guided Adaptive Multimodal Fusion for Vulnerability Detection

Yun Bian, Yi Chen, HaiQuan Wang +2 more

Software vulnerability detection can be formulated as a binary classification problem that determines whether a given code snippet contains security...

4 months ago cs.SE cs.AI cs.CR PDF

Tool HIGH

OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs

Xin Wang, Yunhao Chen, Juncheng Li +7 more

The rapid integration of Multimodal Large Language Models (MLLMs) into critical applications is increasingly hindered by persistent safety...

4 months ago cs.CR cs.CV PDF

Benchmark HIGH

How Real is Your Jailbreak? Fine-grained Jailbreak Evaluation with Anchored Reference

Songyang Liu, Chaozhuo Li, Rui Pu +5 more

Jailbreak attacks present a significant challenge to the safety of Large Language Models (LLMs), yet current automated evaluation methods largely...

4 months ago cs.CR cs.CL PDF

Attack HIGH

Emoji-Based Jailbreaking of Large Language Models

M P V S Gopinadh, S Mahaboob Hussain

Large Language Models (LLMs) are integral to modern AI applications, but their safety alignment mechanisms can be bypassed through adversarial prompt...

4 months ago cs.CR cs.AI PDF

Tool HIGH

Low Rank Comes with Low Security: Gradient Assembly Poisoning Attacks against Distributed LoRA-based LLM Systems

Yueyan Dong, Minghui Xu, Qin Hu +5 more

Low-Rank Adaptation (LoRA) has become a popular solution for fine-tuning large language models (LLMs) in federated settings, dramatically reducing...

4 months ago cs.CR PDF

Attack HIGH

Engineering Attack Vectors and Detecting Anomalies in Additive Manufacturing

Md Mahbub Hasan, Marcus Sternhagen, Krishna Chandra Roy

Additive manufacturing (AM) is rapidly integrating into critical sectors such as aerospace, automotive, and healthcare. However, this cyber-physical...

4 months ago cs.CR cs.AI cs.LG PDF

Benchmark HIGH

An Empirical Evaluation of LLM-Based Approaches for Code Vulnerability Detection: RAG, SFT, and Dual-Agent Systems

Md Hasan Saju, Maher Muhtadi, Akramul Azim

The rapid advancement of Large Language Models (LLMs) presents new opportunities for automated software vulnerability detection, a crucial task in...

4 months ago cs.SE cs.AI PDF

Attack HIGH

Overlooked Safety Vulnerability in LLMs: Malicious Intelligent Optimization Algorithm Request and its Jailbreak

Haoran Gu, Handing Wang, Yi Mei +2 more

The widespread deployment of large language models (LLMs) has raised growing concerns about their misuse risks and associated safety issues. While...

4 months ago cs.CR cs.CL PDF

Attack HIGH

Large Empirical Case Study: Go-Explore adapted for AI Red Team Testing

Manish Bhatt, Adrian Wood, Idan Habler +1 more

Production LLM agents with tool-using capabilities require security testing despite their safety training. We adapt Go-Explore to evaluate...

4 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

GCG Attack On A Diffusion LLM

Ruben Neyroud, Sam Corley

While most LLMs are autoregressive, diffusion-based LLMs have recently emerged as an alternative method for generation. Greedy Coordinate Gradient...

4 months ago cs.LG cs.CL cs.CR PDF

Benchmark HIGH

Language Model Agents Under Attack: A Cross Model-Benchmark of Profit-Seeking Behaviors in Customer Service

Jingyu Zhang

Customer-service LLM agents increasingly make policy-bound decisions (refunds, rebooking, billing disputes), but the same ``helpful'' interaction...

4 months ago cs.CR cs.HC PDF

Attack HIGH

Jailbreaking Attacks vs. Content Safety Filters: How Far Are We in the LLM Safety Arms Race?

Yuan Xin, Dingfan Chen, Linyi Yang +2 more

As large language models (LLMs) are increasingly deployed, ensuring their safe use is paramount. Jailbreaking, adversarial prompts that bypass model...

4 months ago cs.CR cs.AI cs.CL PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial