AI Security Research

2,583+ academic papers on AI security, attacks, and defenses

Total

2,583

Attack

994

Benchmark

740

Defense

355

Tool

275

Survey

146

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 321–340 of 890 papers

Clear filters

Attack HIGH

When Skills Lie: Hidden-Comment Injection in LLM Agents

Qianli Wang, Boyang Ma, Minghui Xu +1 more

LLM agents often rely on Skills to describe available tools and recommended procedures. We study a hidden-comment prompt injection risk in this...

3 months ago cs.CR PDF

Survey HIGH

The Landscape of Prompt Injection Threats in LLM Agents: From Taxonomy to Analysis

Peiran Wang, Xinfeng Li, Chong Xiang +5 more

The evolution of Large Language Models (LLMs) has resulted in a paradigm shift towards autonomous agents, necessitating robust security against...

3 months ago cs.CR cs.CL PDF

Attack HIGH

Detecting Jailbreak Attempts in Clinical Training LLMs Through Automated Linguistic Feature Extraction

Tri Nguyen, Huy Hoang Bao Le, Lohith Srikanth Pentapalli +2 more

Detecting jailbreak attempts in clinical training large language models (LLMs) requires accurate modeling of linguistic deviations that signal unsafe...

3 months ago cs.AI cs.LG PDF

Benchmark HIGH

Red Teaming LLMs as Socio-Technical Practice: From Exploration and Data Creation to Evaluation

Adriana Alvarado Garcia, Ruyuan Wan, Ozioma C. Oguine +1 more

Recently, red teaming, with roots in security, has become a key evaluative approach to ensure the safety and reliability of Generative Artificial...

3 months ago cs.CY cs.AI cs.CL PDF

Survey HIGH

QRS: A Rule-Synthesizing Neuro-Symbolic Triad for Autonomous Vulnerability Discovery

George Tsigkourakos, Constantinos Patsakis

Static Application Security Testing (SAST) tools are integral to modern DevSecOps pipelines, yet tools like CodeQL, Semgrep, and SonarQube remain...

3 months ago cs.CR PDF

Tool HIGH

Stop Testing Attacks, Start Diagnosing Defenses: The Four-Checkpoint Framework Reveals Where LLM Safety Breaks

Hayfa Dhabhi, Kashyap Thimmaraju

Large Language Models (LLMs) deploy safety mechanisms to prevent harmful outputs, yet these defenses remain vulnerable to adversarial prompts. While...

3 months ago cs.CR cs.AI cs.CY PDF

Benchmark HIGH

CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation

Chaeyun Kim, YongTaek Lim, Kihyun Kim +2 more

Existing red-teaming benchmarks, when adapted to new languages via direct translation, fail to capture socio-technical vulnerabilities rooted in...

3 months ago cs.CY cs.AI PDF

Attack HIGH

MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

Georgios Syros, Evan Rose, Brian Grinstead +4 more

Large language model (LLM) based web agents are increasingly deployed to automate complex online tasks by directly interacting with web sites and...

3 months ago cs.CR cs.AI PDF

Attack HIGH

One RNG to Rule Them All: How Randomness Becomes an Attack Vector in Machine Learning

Kotekar Annapoorna Prabhu, Andrew Gan, Zahra Ghodsi

Machine learning relies on randomness as a fundamental component in various steps such as data sampling, data augmentation, weight initialization,...

3 months ago cs.CR cs.LG PDF

Attack HIGH

Red-teaming the Multimodal Reasoning: Jailbreaking Vision-Language Models via Cross-modal Entanglement Attacks

Yu Yan, Sheng Sun, Shengjia Cheng +3 more

Vision-Language Models (VLMs) with multimodal reasoning capabilities are high-value attack targets, given their potential for handling complex...

3 months ago cs.CR cs.AI PDF

Attack HIGH

StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors

Suraj Ranganath, Atharv Ramesh

AI-text detectors face a critical robustness challenge: adversarial paraphrasing attacks that preserve semantics while evading detection. We...

3 months ago cs.LG cs.AI cs.CR PDF

Attack HIGH

StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors

Suraj Ranganath, Atharv Ramesh

AI-text detectors face a critical robustness challenge: adversarial paraphrasing attacks that preserve semantics while evading detection. We...

3 months ago cs.LG cs.AI cs.CR PDF

Attack HIGH

Large Language Lobotomy: Jailbreaking Mixture-of-Experts via Expert Silencing

Jona te Lintelo, Lichao Wu, Stjepan Picek

The rapid adoption of Mixture-of-Experts (MoE) architectures marks a major shift in the deployment of Large Language Models (LLMs). MoE LLMs improve...

3 months ago cs.CR PDF

Attack HIGH

Dashed Line Defense: Plug-And-Play Defense Against Adaptive Score-Based Query Attacks

Yanzhang Fu, Zizheng Guo, Jizhou Luo

Score-based query attacks pose a serious threat to deep learning models by crafting adversarial examples (AEs) using only black-box access to model...

3 months ago cs.LG cs.CR PDF

Attack HIGH

Retrieval Pivot Attacks in Hybrid RAG: Measuring and Mitigating Amplified Leakage from Vector Seeds to Graph Expansion

Scott Thornton

Hybrid Retrieval-Augmented Generation (RAG) pipelines combine vector similarity search with knowledge graph expansion for multi-hop reasoning. We...

3 months ago cs.CR cs.IR cs.LG PDF

Benchmark HIGH

From Assistant to Double Agent: Formalizing and Benchmarking Attacks on OpenClaw for Personalized Local AI Agent

Yuhang Wang, Feiming Xu, Zheng Lin +6 more

Although large language model (LLM)-based agents, exemplified by OpenClaw, are increasingly evolving from task-oriented systems into personalized AI...

3 months ago cs.AI PDF

Tool HIGH

NutVLM: A Self-Adaptive Defense Framework against Full-Dimension Attacks for Vision Language Models in Autonomous Driving

Xiaoxu Peng, Dong Zhou, Jianwen Zhang +3 more

Vision Language Models (VLMs) have advanced perception in autonomous driving (AD), but they remain vulnerable to adversarial threats. These risks...

3 months ago cs.CV eess.IV PDF

Attack HIGH

Evasion of IoT Malware Detection via Dummy Code Injection

Sahar Zargarzadeh, Mohammad Islam

The Internet of Things (IoT) has revolutionized connectivity by linking billions of devices worldwide. However, this rapid expansion has also...

3 months ago cs.CR cs.LG PDF

Attack HIGH

Robustness of Vision Language Models Against Split-Image Harmful Input Attacks

Md Rafi Ur Rashid, MD Sadik Hossain Shanto, Vishnu Asutosh Dasu +1 more

Vision-Language Models (VLMs) are now a core part of modern AI. Recent work proposed several visual jailbreak attacks using single/ holistic images....

3 months ago cs.CV cs.AI PDF

Benchmark HIGH

CyberExplorer: Benchmarking LLM Offensive Security Capabilities in a Real-World Attacking Simulation Environment

Nanda Rani, Kimberly Milner, Minghao Shao +9 more

Real-world offensive security operations are inherently open-ended: attackers explore unknown attack surfaces, revise hypotheses under uncertainty,...

3 months ago cs.CR cs.AI cs.MA PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial