AI Security Research

2,589+ academic papers on AI security, attacks, and defenses

Total

2,589

Attack

998

Benchmark

740

Defense

355

Tool

276

Survey

147

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1021–1040 of 1,927 papers

Clear filters

Attack MEDIUM

LLMs + Security = Trouble

Benjamin Livshits

We argue that when it comes to producing secure code with AI, the prevailing "fighting fire with fire" approach -- using probabilistic AI-based...

3 months ago cs.CR cs.AI cs.SE PDF

Benchmark HIGH

From Assistant to Double Agent: Formalizing and Benchmarking Attacks on OpenClaw for Personalized Local AI Agent

Yuhang Wang, Feiming Xu, Zheng Lin +6 more

Although large language model (LLM)-based agents, exemplified by OpenClaw, are increasingly evolving from task-oriented systems into personalized AI...

3 months ago cs.AI PDF

Benchmark MEDIUM

On Protecting Agentic Systems' Intellectual Property via Watermarking

Liwen Wang, Zongjie Li, Yuchong Xie +4 more

The evolution of Large Language Models (LLMs) into agentic systems that perform autonomous reasoning and tool use has created significant...

3 months ago cs.AI cs.CR PDF

Benchmark MEDIUM

Moral Sycophancy in Vision Language Models

Shadman Rabby, Md. Hefzul Hossain Papon, Sabbir Ahmed +3 more

Sycophancy in Vision-Language Models (VLMs) refers to their tendency to align with user opinions, often at the expense of moral or factual accuracy....

3 months ago cs.AI PDF

Tool HIGH

NutVLM: A Self-Adaptive Defense Framework against Full-Dimension Attacks for Vision Language Models in Autonomous Driving

Xiaoxu Peng, Dong Zhou, Jianwen Zhang +3 more

Vision Language Models (VLMs) have advanced perception in autonomous driving (AD), but they remain vulnerable to adversarial threats. These risks...

3 months ago cs.CV eess.IV PDF

Attack HIGH

Evasion of IoT Malware Detection via Dummy Code Injection

Sahar Zargarzadeh, Mohammad Islam

The Internet of Things (IoT) has revolutionized connectivity by linking billions of devices worldwide. However, this rapid expansion has also...

3 months ago cs.CR cs.LG PDF

Attack LOW

Test vs Mutant: Adversarial LLM Agents for Robust Unit Test Generation

Pengyu Chang, Yixiong Fang, Silin Chen +3 more

Software testing is a critical, yet resource-intensive phase of the software development lifecycle. Over the years, various automated tools have been...

3 months ago cs.SE PDF

Attack HIGH

Robustness of Vision Language Models Against Split-Image Harmful Input Attacks

Md Rafi Ur Rashid, MD Sadik Hossain Shanto, Vishnu Asutosh Dasu +1 more

Vision-Language Models (VLMs) are now a core part of modern AI. Recent work proposed several visual jailbreak attacks using single/ holistic images....

3 months ago cs.CV cs.AI PDF

Defense MEDIUM

Efficient and Adaptable Detection of Malicious LLM Prompts via Bootstrap Aggregation

Shayan Ali Hassan, Tao Ni, Zafar Ayyub Qazi +1 more

Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding, reasoning, and generation. However, these...

3 months ago cs.LG cs.CR PDF

Benchmark HIGH

CyberExplorer: Benchmarking LLM Offensive Security Capabilities in a Real-World Attacking Simulation Environment

Nanda Rani, Kimberly Milner, Minghao Shao +9 more

Real-world offensive security operations are inherently open-ended: attackers explore unknown attack surfaces, revise hypotheses under uncertainty,...

3 months ago cs.CR cs.AI cs.MA PDF

Attack HIGH

CausalArmor: Efficient Indirect Prompt Injection Guardrails via Causal Attribution

Minbeom Kim, Mihir Parmar, Phillip Wallis +5 more

AI agents equipped with tool-calling capabilities are susceptible to Indirect Prompt Injection (IPI) attacks. In this attack scenario, malicious...

3 months ago cs.CR cs.LG stat.ME PDF

Tool HIGH

Rethinking Latency Denial-of-Service: Attacking the LLM Serving Framework, Not the Model

Tianyi Wang, Huawei Fan, Yuanchao Shu +2 more

Large Language Models face an emerging and critical threat known as latency attacks. Because LLM inference is inherently expensive, even modest...

3 months ago cs.CR cs.AI PDF

Benchmark LOW

Blind to the Human Touch: Overlap Bias in LLM-Based Summary Evaluation

Jiangnan Fang, Cheng-Tse Liu, Hanieh Deilamsalehy +5 more

Large language model (LLM) judges have often been used alongside traditional, algorithm-based metrics for tasks like summarization because they...

3 months ago cs.CL PDF

Survey LOW

SoK: DARPA's AI Cyber Challenge (AIxCC): Competition Design, Architectures, and Lessons Learned

Cen Zhang, Younggi Park, Fabian Fleischer +20 more

DARPA's AI Cyber Challenge (AIxCC, 2023--2025) is the largest competition to date for building fully autonomous cyber reasoning systems (CRSs) that...

3 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Agent-Fence: Mapping Security Vulnerabilities Across Deep Research Agents

Sai Puppala, Ismail Hossain, Md Jahangir Alam +5 more

Large language models are increasingly deployed as *deep agents* that plan, maintain persistent state, and invoke external tools, shifting safety...

3 months ago cs.CR cs.AI PDF

Attack HIGH

MemPot: Defending Against Memory Extraction Attack with Optimized Honeypots

Yuhao Wang, Shengfang Zhai, Guanghao Jin +3 more

Large Language Model (LLM)-based agents employ external and internal memory systems to handle complex, goal-oriented tasks, yet this exposes them to...

3 months ago cs.CR cs.AI cs.CL PDF

Attack MEDIUM

Reverse-Engineering Model Editing on Language Models

Zhiyu Sun, Minrui Luo, Yu Wang +2 more

Large language models (LLMs) are pretrained on corpora containing trillions of tokens and, therefore, inevitably memorize sensitive information....

3 months ago cs.CR cs.AI cs.CL PDF

Benchmark HIGH

Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model

Tianyi Wu, Mingzhe Du, Yue Liu +4 more

Large language models (LLMs) are increasingly used in software development, yet their tendency to generate insecure code remains a major barrier to...

3 months ago cs.CR cs.AI cs.CL PDF

Attack MEDIUM

AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management

Ruoyao Wen, Hao Li, Chaowei Xiao +1 more

Indirect prompt injection threatens LLM agents by embedding malicious instructions in external content, enabling unauthorized actions and data theft....

3 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

NAAMSE: Framework for Evolutionary Security Evaluation of Agents

Kunal Pai, Parth Shah, Harshil Patel

AI agents are increasingly deployed in production, yet their security evaluations remain bottlenecked by manual red-teaming or static benchmarks that...

3 months ago cs.AI cs.MA PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial