AI Security Research

2,560+ academic papers on AI security, attacks, and defenses

Total

2,560

Attack

982

Benchmark

736

Defense

350

Tool

275

Survey

144

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 281–300 of 557 papers

Clear filters

Benchmark MEDIUM

Moral Sycophancy in Vision Language Models

Shadman Rabby, Md. Hefzul Hossain Papon, Sabbir Ahmed +3 more

Sycophancy in Vision-Language Models (VLMs) refers to their tendency to align with user opinions, often at the expense of moral or factual accuracy....

3 months ago cs.AI PDF

Benchmark HIGH

CyberExplorer: Benchmarking LLM Offensive Security Capabilities in a Real-World Attacking Simulation Environment

Nanda Rani, Kimberly Milner, Minghao Shao +9 more

Real-world offensive security operations are inherently open-ended: attackers explore unknown attack surfaces, revise hypotheses under uncertainty,...

3 months ago cs.CR cs.AI cs.MA PDF

Benchmark LOW

Blind to the Human Touch: Overlap Bias in LLM-Based Summary Evaluation

Jiangnan Fang, Cheng-Tse Liu, Hanieh Deilamsalehy +5 more

Large language model (LLM) judges have often been used alongside traditional, algorithm-based metrics for tasks like summarization because they...

3 months ago cs.CL PDF

Benchmark MEDIUM

Agent-Fence: Mapping Security Vulnerabilities Across Deep Research Agents

Sai Puppala, Ismail Hossain, Md Jahangir Alam +5 more

Large language models are increasingly deployed as *deep agents* that plan, maintain persistent state, and invoke external tools, shifting safety...

3 months ago cs.CR cs.AI PDF

Benchmark HIGH

Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model

Tianyi Wu, Mingzhe Du, Yue Liu +4 more

Large language models (LLMs) are increasingly used in software development, yet their tendency to generate insecure code remains a major barrier to...

3 months ago cs.CR cs.AI cs.CL PDF

Benchmark MEDIUM

NAAMSE: Framework for Evolutionary Security Evaluation of Agents

Kunal Pai, Parth Shah, Harshil Patel

AI agents are increasingly deployed in production, yet their security evaluations remain bottlenecked by manual red-teaming or static benchmarks that...

3 months ago cs.AI cs.MA PDF

Benchmark MEDIUM

Aegis: Towards Governance, Integrity, and Security of AI Voice Agents

Xiang Li, Pin-Yu Chen, Wenqi Wei

With the rapid advancement and adoption of Audio Large Language Models (ALLMs), voice agents are now being deployed in high-stakes domains such as...

3 months ago cs.CR cs.MA PDF

Benchmark MEDIUM

Beyond Crash: Hijacking Your Autonomous Vehicle for Fun and Profit

Qi Sun, Ahmed Abdo, Luis Burbano +4 more

Autonomous Vehicles (AVs), especially vision-based AVs, are rapidly being deployed without human operators. As AVs operate in safety-critical...

3 months ago cs.CR cs.LG PDF

Benchmark HIGH

Evaluating and Enhancing the Vulnerability Reasoning Capabilities of Large Language Models

Li Lu, Yanjie Zhao, Hongzhou Rao +2 more

Large Language Models (LLMs) have demonstrated remarkable proficiency in vulnerability detection. However, a critical reliability gap persists:...

3 months ago cs.CR PDF

Benchmark MEDIUM

Confundo: Learning to Generate Robust Poison for Practical RAG Systems

Haoyang Hu, Zhejun Jiang, Yueming Lyu +3 more

Retrieval-augmented generation (RAG) is increasingly deployed in real-world applications, where its reference-grounded design makes outputs appear...

3 months ago cs.CR cs.LG PDF

Benchmark MEDIUM

Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study

Yi Liu, Zhihao Chen, Yanjun Zhang +5 more

Third-party agent skills extend LLM-based agents with instruction files and executable code that run on users' machines. Skills execute with user...

3 months ago cs.CR cs.AI cs.CL PDF

Benchmark HIGH

MPIB: A Benchmark for Medical Prompt Injection Attacks and Clinical Safety in LLMs

Junhyeok Lee, Han Jang, Kyu Sung Choi

Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems are increasingly integrated into clinical workflows; however, prompt...

3 months ago cs.CL cs.LG PDF

Benchmark MEDIUM

Steering Safely or Off a Cliff? Rethinking Specificity and Robustness in Inference-Time Interventions

Navita Goyal, Hal Daumé

Model steering, which involves intervening on hidden representations at inference time, has emerged as a lightweight alternative to finetuning for...

3 months ago cs.LG cs.AI cs.CL PDF

Benchmark MEDIUM

Private and interpretable clinical prediction with quantum-inspired tensor train models

José Ramón Pareja Monturiol, Juliette Sinnott, Roger G. Melko +1 more

Machine learning in clinical settings must balance predictive accuracy, interpretability, and privacy. Models such as logistic regression (LR) offer...

3 months ago cs.LG cs.CR quant-ph PDF

Benchmark LOW

CASTLE: A Comprehensive Benchmark for Evaluating Student-Tailored Personalized Safety in Large Language Models

Rui Jia, Ruiyi Lan, Fengrui Liu +7 more

Large language models (LLMs) have advanced the development of personalized learning in education. However, their inherent generation mechanisms often...

3 months ago cs.CL PDF

Benchmark LOW

Beyond single-channel agentic benchmarking

Nelu D. Radpour

Contemporary benchmarks for agentic artificial intelligence (AI) frequently evaluate safety through isolated task-level accuracy thresholds,...

3 months ago cs.CY cs.AI cs.HC PDF

Benchmark MEDIUM

Do Vision-Language Models Respect Contextual Integrity in Location Disclosure?

Ruixin Yang, Ethan Mendes, Arthur Wang +4 more

Vision-language models (VLMs) have demonstrated strong performance in image geolocation, a capability further sharpened by frontier multimodal large...

3 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Alignment Drift in Multimodal LLMs: A Two-Phase, Longitudinal Evaluation of Harm Across Eight Model Releases

Casey Ford, Madison Van Doren, Emily Dix

Multimodal large language models (MLLMs) are increasingly deployed in real-world systems, yet their safety under adversarial prompting remains...

3 months ago cs.CL cs.AI cs.HC PDF

Benchmark LOW

From Data to Behavior: Predicting Unintended Model Behaviors Before Training

Mengru Wang, Zhenqian Xu, Junfeng Fang +4 more

Large Language Models (LLMs) can acquire unintended biases from seemingly benign training data even without explicit cues or malicious content....

3 months ago cs.LG cs.AI cs.CL PDF

Benchmark MEDIUM

Trust The Typical

Debargha Ganguly, Sreehari Sankar, Biyao Zhang +8 more

Current approaches to LLM safety fundamentally rely on a brittle cat-and-mouse game of identifying and blocking known threats via guardrails. We...

3 months ago cs.CL cs.AI cs.DC PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial