AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 2361–2380 of 2,529 papers

Benchmark LOW

Can an LLM Induce a Graph? Investigating Memory Drift and Context Length

Raquib Bin Yousuf, Aadyant Khatri, Shengzhe Xu +2 more

Recently proposed evaluation benchmarks aim to characterize the effective context length and the forgetting tendencies of large language models...

7 months ago cs.CL cs.AI cs.LG PDF

Attack MEDIUM

Machine Unlearning Meets Adversarial Robustness via Constrained Interventions on LLMs

Fatmazohra Rezkellah, Ramzi Dakhmouche

With the increasing adoption of Large Language Models (LLMs), more customization is needed to ensure privacy-preserving and safe generation. We...

7 months ago cs.LG cs.CL cs.CR PDF

Benchmark MEDIUM

Certifiable Safe RLHF: Fixed-Penalty Constraint Optimization for Safer Language Models

Kartik Pandit, Sourav Ganguly, Arnesh Banerjee +2 more

Ensuring safety is a foundational requirement for large language models (LLMs). Achieving an appropriate balance between enhancing the utility of...

7 months ago cs.LG cs.AI eess.SY PDF

Attack HIGH

NEXUS: Network Exploration for eXploiting Unsafe Sequences in Multi-Turn LLM Jailbreaks

Javad Rafiei Asl, Sidhant Narula, Mohammad Ghasemigol +2 more

Large Language Models (LLMs) have revolutionized natural language processing but remain vulnerable to jailbreak attacks, especially multi-turn...

7 months ago cs.CR cs.AI PDF

Attack HIGH

LegalSim: Multi-Agent Simulation of Legal Systems for Discovering Procedural Exploits

Sanket Badhe

We present LegalSim, a modular multi-agent simulation of adversarial legal proceedings that explores how AI systems can exploit procedural weaknesses...

7 months ago cs.MA cs.AI cs.CR PDF

Benchmark MEDIUM

FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents

Imene Kerboua, Sahar Omidi Shayegan, Megh Thakkar +7 more

Web agents powered by large language models (LLMs) must process lengthy web page observations to complete user goals; these pages often exceed tens...

7 months ago cs.CL PDF

Attack HIGH

Untargeted Jailbreak Attack

Xinzhe Huang, Wenjing Hu, Tianhang Zheng +5 more

Existing gradient-based jailbreak attacks on Large Language Models (LLMs) typically optimize adversarial suffixes to align the LLM output with...

7 months ago cs.CR cs.AI PDF

Attack HIGH

External Data Extraction Attacks against Retrieval-Augmented Large Language Models

Yu He, Yifei Chen, Yiming Li +5 more

In recent years, RAG has emerged as a key paradigm for enhancing large language models (LLMs). By integrating externally retrieved information, RAG...

7 months ago cs.CR PDF

Benchmark MEDIUM

Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain

Léo Boisvert, Abhay Puri, Chandra Kiran Reddy Evuru +6 more

While finetuning AI agents on interaction data -- such as web browsing or tool use -- improves their capabilities, it also introduces critical...

7 months ago cs.CR cs.AI cs.LG PDF

Benchmark MEDIUM

Zero-Shot Robustness of Vision Language Models Via Confidence-Aware Weighting

Nikoo Naghavian, Mostafa Tavassolipour

Vision-language models like CLIP demonstrate impressive zero-shot generalization but remain highly vulnerable to adversarial attacks. In this work,...

7 months ago cs.CV PDF

Attack HIGH

Attack via Overfitting: 10-shot Benign Fine-tuning to Jailbreak LLMs

Zhixin Xie, Xurui Song, Jun Luo

Despite substantial efforts in safety alignment, recent research indicates that Large Language Models (LLMs) remain highly susceptible to jailbreak...

7 months ago cs.CR PDF

Attack MEDIUM

Adversarial Reinforcement Learning for Offensive and Defensive Agents in a Simulated Zero-Sum Network Environment

Abrar Shahid, Ibteeker Mahir Ishum, AKM Tahmidul Haque +2 more

This paper presents a controlled study of adversarial reinforcement learning in network security through a custom OpenAI Gym environment that models...

7 months ago cs.LG cs.AI cs.CR PDF

Defense MEDIUM

VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation

Lesly Miculicich, Mihir Parmar, Hamid Palangi +4 more

The deployment of autonomous AI agents in sensitive domains, such as healthcare, introduces critical risks to safety, security, and privacy. These...

7 months ago cs.SE cs.AI cs.CR PDF

Attack HIGH

A Statistical Method for Attack-Agnostic Adversarial Attack Detection with Compressive Sensing Comparison

Chinthana Wimalasuriya, Spyros Tragoudas

Adversarial attacks present a significant threat to modern machine learning systems. Yet, existing detection methods often lack the ability to detect...

7 months ago cs.CR cs.CV cs.LG PDF

Tool MEDIUM

MALF: A Multi-Agent LLM Framework for Intelligent Fuzzing of Industrial Control Protocols

Bowei Ning, Xuejun Zong, Kan He

Industrial control systems (ICS) are vital to modern infrastructure but increasingly vulnerable to cybersecurity threats, particularly through...

7 months ago cs.CR PDF

Attack HIGH

ARMs: Adaptive Red-Teaming Agent against Multimodal Models with Plug-and-Play Attacks

Zhaorun Chen, Xun Liu, Mintong Kang +4 more

As vision-language models (VLMs) gain prominence, their multimodal interfaces also introduce new safety vulnerabilities, making the safety evaluation...

7 months ago cs.AI cs.LG PDF

Benchmark HIGH

RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents

Chengquan Guo, Chulin Xie, Yu Yang +6 more

Code agents have gained widespread adoption due to their strong code generation capabilities and integration with code interpreters, enabling dynamic...

7 months ago cs.SE PDF

Benchmark MEDIUM

Who's Wearing? Ear Canal Biometric Key Extraction for User Authentication on Wireless Earbuds

Chenpei Huang, Lingfeng Yao, Hui Zhong +5 more

Ear canal scanning/sensing (ECS) has emerged as a novel biometric authentication method for mobile devices paired with wireless earbuds. Existing...

7 months ago cs.CR cs.HC PDF

Tool HIGH

ToolTweak: An Attack on Tool Selection in LLM-based Agents

Jonathan Sneh, Ruomei Yan, Jialin Yu +6 more

As LLMs increasingly power agents that interact with external tools, tool use has become an essential mechanism for extending their capabilities....

7 months ago cs.CR cs.AI PDF

Attack HIGH

Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks

Ruohao Guo, Afshin Oroojlooy, Roshan Sridhar +3 more

Despite recent rapid progress in AI safety, current large language models remain vulnerable to adversarial attacks in multi-turn interaction...

7 months ago cs.LG cs.AI cs.CL PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial