AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 2421–2440 of 2,529 papers

Benchmark MEDIUM

Fairness Testing in Retrieval-Augmented Generation: How Small Perturbations Reveal Bias in Small Language Models

Matheus Vinicius da Silva de Oliveira, Jonathan de Andrade Silva, Awdren de Lima Fontao

Large Language Models (LLMs) are widely used across multiple domains but continue to raise concerns regarding security and fairness. Beyond known...

7 months ago cs.AI cs.IR cs.LG PDF

Other LOW

Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning

Zheng Zhang, Ziwei Shan, Kaitao Song +2 more

Process Reward Models (PRMs) have emerged as a promising approach to enhance the reasoning capabilities of large language models (LLMs) by guiding...

7 months ago cs.LG PDF

Attack MEDIUM

DeepProv: Behavioral Characterization and Repair of Neural Networks via Inference Provenance Graph Analysis

Firas Ben Hmida, Abderrahmen Amich, Ata Kaboudi +1 more

Deep neural networks (DNNs) are increasingly being deployed in high-stakes applications, from self-driving cars to biometric authentication. However,...

7 months ago cs.CR cs.LG PDF

Benchmark LOW

Towards Reliable Benchmarking: A Contamination Free, Controllable Evaluation Framework for Multi-step LLM Function Calling

Seiji Maekawa, Jackson Hassell, Pouya Pezeshkpour +2 more

Existing benchmarks for tool-augmented language models (TaLMs) lack fine-grained control over task difficulty and remain vulnerable to data...

7 months ago cs.CL cs.PL PDF

Benchmark LOW

SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From

Yao Tong, Haonan Wang, Siquan Li +2 more

Fingerprinting Large Language Models (LLMs) is essential for provenance verification and model attribution. Existing methods typically extract...

7 months ago cs.CR cs.AI cs.CL PDF

Attack LOW

Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents

Shuai Shao, Qihan Ren, Chen Qian +8 more

Advances in Large Language Models (LLMs) have enabled a new class of self-evolving agents that autonomously improve through interaction with the...

7 months ago cs.AI cs.CL cs.LG PDF

Attack HIGH

SafeBehavior: Simulating Human-Like Multistage Reasoning to Mitigate Jailbreak Attacks in Large Language Models

Qinjian Zhao, Jiaqi Wang, Zhiqiang Gao +3 more

Large Language Models (LLMs) have achieved impressive performance across diverse natural language processing tasks, but their growing power also...

7 months ago cs.AI PDF

Benchmark LOW

Sandbagging in a Simple Survival Bandit Problem

Joel Dyer, Daniel Jarne Ornia, Nicholas Bishop +2 more

Evaluating the safety of frontier AI systems is an increasingly important concern, helping to measure the capabilities of such models and identify...

7 months ago cs.LG cs.AI stat.ML PDF

Benchmark LOW

SafeEvalAgent: Toward Agentic and Self-Evolving Safety Evaluation of LLMs

Yixu Wang, Xin Wang, Yang Yao +4 more

The rapid integration of Large Language Models (LLMs) into high-stakes domains necessitates reliable safety and compliance evaluation. However,...

7 months ago cs.AI PDF

Attack HIGH

Stealthy Yet Effective: Distribution-Preserving Backdoor Attacks on Graph Classification

Xiaobao Wang, Ruoxiao Sun, Yujun Zhang +4 more

Graph Neural Networks (GNNs) have demonstrated strong performance across tasks such as node classification, link prediction, and graph...

7 months ago cs.LG cs.CR PDF

Attack MEDIUM

The Impact of Scaling Training Data on Adversarial Robustness

Marco Zimmerli, Andreas Plesner, Till Aczel +1 more

Deep neural networks remain vulnerable to adversarial examples despite advances in architectures and training paradigms. We investigate how training...

7 months ago cs.CV cs.AI cs.CR PDF

Attack MEDIUM

Better Privilege Separation for Agents by Restricting Data Types

Dennis Jacob, Emad Alghamdi, Zhanhao Hu +2 more

Large language models (LLMs) have become increasingly popular due to their ability to interact with unstructured content. As such, LLMs are now a key...

7 months ago cs.CR cs.LG PDF

Survey MEDIUM

LLaVAShield: Safeguarding Multimodal Multi-Turn Dialogues in Vision-Language Models

Guolei Huang, Qinzhi Peng, Gan Xu +3 more

As Vision-Language Models (VLMs) move into interactive, multi-turn use, safety concerns intensify for multimodal multi-turn dialogue, which is...

7 months ago cs.CV PDF

Benchmark HIGH

Red Teaming Program Repair Agents: When Correct Patches can Hide Vulnerabilities

Simin Chen, Yixin He, Suman Jana +1 more

LLM-based agents are increasingly deployed for software maintenance tasks such as automated program repair (APR). APR agents automatically fetch...

7 months ago cs.SE PDF

Benchmark LOW

SafeMind: Benchmarking and Mitigating Safety Risks in Embodied LLM Agents

Ruolin Chen, Yinqian Sun, Jihang Wang +3 more

Embodied agents powered by large language models (LLMs) inherit advanced planning capabilities; however, their direct interaction with the physical...

7 months ago cs.AI PDF

Attack HIGH

ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack

Yein Park, Jungwoo Park, Jaewoo Kang

Large language models (LLMs), despite being safety-aligned, exhibit brittle refusal behaviors that can be circumvented by simple linguistic changes....

7 months ago cs.AI PDF

Benchmark LOW

Rotation Control Unlearning: Quantifying and Controlling Continuous Unlearning for LLM with The Cognitive Rotation Space

Xiang Zhang, Kun Wei, Xu Yang +3 more

As Large Language Models (LLMs) become increasingly prevalent, their security vulnerabilities have already drawn attention. Machine unlearning is...

7 months ago cs.LG cs.CL PDF

Other LOW

How Diffusion Models Memorize

Juyeop Kim, Songkuk Kim, Jong-Seok Lee

Despite their success in image generation, diffusion models can memorize training data, raising serious privacy and copyright concerns. Although...

7 months ago cs.CV PDF

Tool HIGH

STAC: When Innocent Tools Form Dangerous Chains to Jailbreak LLM Agents

Jing-Jing Li, Jianfeng He, Chao Shang +6 more

As LLMs advance into autonomous agents with tool-use capabilities, they introduce security challenges that extend beyond traditional content-based...

7 months ago cs.CR cs.AI cs.CL PDF

Defense LOW

Defeating Cerberus: Concept-Guided Privacy-Leakage Mitigation in Multimodal Language Models

Boyang Zhang, Istemi Ekin Akkus, Ruichuan Chen +4 more

Multimodal large language models (MLLMs) have demonstrated remarkable capabilities in processing and reasoning over diverse modalities, but their...

7 months ago cs.CR cs.LG PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial