AI Security Research

2,583+ academic papers on AI security, attacks, and defenses

Total

2,583

Attack

994

Benchmark

740

Defense

355

Tool

275

Survey

146

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 141–160 of 657 papers

Clear filters

Survey HIGH

Mapping the Exploitation Surface: A 10,000-Trial Taxonomy of What Makes LLM Agents Exploit Vulnerabilities

Charafeddine Mouzouni

LLM agents with tool access can discover and exploit security vulnerabilities. This is known. What is not known is which features of a system prompt...

1 months ago cs.CR cs.AI cs.CL PDF

Tool HIGH

ShieldNet: Network-Level Guardrails against Emerging Supply-Chain Injections in Agentic Systems

Zhuowen Yuan, Zhaorun Chen, Zhen Xiang +5 more

Existing research on LLM agent security mainly focuses on prompt injection and unsafe input/output behaviors. However, as agents increasingly rely on...

1 months ago cs.AI PDF

Attack HIGH

CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks

Siyuan Li, Zehao Liu, Xi Lin +6 more

As Large Language Models (LLMs) are increasingly deployed in complex applications, their vulnerability to adversarial attacks raises urgent safety...

1 months ago cs.CR cs.AI PDF

Defense HIGH

RuleForge: Automated Generation and Validation for Web Vulnerability Detection at Scale

Ayush Garg, Sophia Hager, Jacob Montiel +5 more

Security teams face a challenge: the volume of newly disclosed Common Vulnerabilities and Exposures (CVEs) far exceeds the capacity to manually...

1 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

SelfGrader: Stable Jailbreak Detection for Large Language Models using Token-Level Logits

Zikai Zhang, Rui Hu, Olivera Kotevska +1 more

Large Language Models (LLMs) are powerful tools for answering user queries, yet they remain highly vulnerable to jailbreak attacks. Existing...

1 months ago cs.CR cs.AI PDF

Attack HIGH

No Attacker Needed: Unintentional Cross-User Contamination in Shared-State LLM Agents

Tiankai Yang, Jiate Li, Yi Nian +5 more

LLM-based agents increasingly operate across repeated sessions, maintaining task states to ensure continuity. In many deployments, a single agent...

1 months ago cs.CL cs.AI cs.CR PDF

Attack HIGH

AgentWatcher: A Rule-based Prompt Injection Monitor

Yanting Wang, Wei Zou, Runpeng Geng +1 more

Large language models (LLMs) and their applications, such as agents, are highly vulnerable to prompt injection attacks. State-of-the-art prompt...

1 months ago cs.CR PDF

Tool HIGH

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

Anubhab Sahu, Diptisha Samanta, Reza Soosahabi

System Instructions in Large Language Models (LLMs) are commonly used to enforce safety policies, define agent behavior, and protect sensitive...

1 months ago cs.CR cs.AI PDF

Tool HIGH

PDA: Text-Augmented Defense Framework for Robust Vision-Language Models against Adversarial Image Attacks

Jingning Xu, Haochen Luo, Chen Liu

Vision-language models (VLMs) are vulnerable to adversarial image perturbations. Existing works based on adversarial training against task-specific...

1 months ago cs.CV cs.MM PDF

Attack HIGH

When Safe Models Merge into Danger: Exploiting Latent Vulnerabilities in LLM Fusion

Jiaqing Li, Zhibo Zhang, Shide Zhou +3 more

Model merging has emerged as a powerful technique for combining specialized capabilities from multiple fine-tuned LLMs without additional training...

1 months ago cs.CR PDF

Tool HIGH

The Persistent Vulnerability of Aligned AI Systems

Aengus Lynch

Autonomous AI agents are being deployed with filesystem access, email control, and multi-step planning. This thesis contributes to four open problems...

1 months ago cs.LG cs.AI PDF

Tool HIGH

Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks

Chong Xiang, Drew Zagieboylo, Shaona Ghosh +5 more

AI agents, predominantly powered by large language models (LLMs), are vulnerable to indirect prompt injection, in which malicious instructions...

1 months ago cs.CR cs.AI PDF

Attack HIGH

Adversarial Prompt Injection Attack on Multimodal Large Language Models

Meiwen Ding, Song Xia, Chenqi Kong +1 more

Although multimodal large language models (MLLMs) are increasingly deployed in real-world applications, their instruction-following behavior leaves...

1 months ago cs.CV cs.AI PDF

Attack HIGH

Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning

Kavindu Herath, Joshua Zhao, Saurabh Bagchi

Backdoor attacks on federated learning (FL) are most often evaluated with synthetic corner patches or out-of-distribution (OOD) patterns that are...

1 months ago cs.CR cs.AI cs.CV PDF

Defense HIGH

Software Vulnerability Detection Using a Lightweight Graph Neural Network

Miles Farmer, Ekincan Ufuktepe, Anne Watson +4 more

Large Language Models (LLMs) have emerged as a popular choice in vulnerability detection studies given their foundational capabilities, open source...

1 months ago cs.SE cs.AI cs.CR PDF

Attack HIGH

Dummy-Aware Weighted Attack (DAWA): Breaking the Safe Sink in Dummy Class Defenses

Yunrui Yu, Xuxiang Feng, Pengda Qin +5 more

Adversarial robustness evaluation faces a critical challenge as new defense paradigms emerge that can exploit limitations in existing assessment...

1 months ago cs.LG cs.CR PDF

Tool HIGH

CivicShield: A Cross-Domain Defense-in-Depth Framework for Securing Government-Facing AI Chatbots Against Multi-Turn Adversarial Attacks

KrishnaSaiReddy Patil

LLM-based chatbots in government services face critical security gaps. Multi-turn adversarial attacks achieve over 90% success against current...

1 months ago cs.CR cs.AI PDF

Attack HIGH

Trojan-Speak: Bypassing Constitutional Classifiers with No Jailbreak Tax via Adversarial Finetuning

Bilgehan Sel, Xuanli He, Alwin Peng +2 more

Fine-tuning APIs offered by major AI providers create new attack surfaces where adversaries can bypass safety measures through targeted fine-tuning....

1 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

\texttt{ReproMIA}: A Comprehensive Analysis of Model Reprogramming for Proactive Membership Inference Attacks

Chihan Huang, Huaijin Wang, Shuai Wang

The pervasive deployment of deep learning models across critical domains has concurrently intensified privacy concerns due to their inherent...

1 months ago cs.LG cs.CR PDF

Attack HIGH

XSPA: Crafting Imperceptible X-Shaped Sparse Adversarial Perturbations for Transferable Attacks on VLMs

Chengyin Hu, Jiaju Han, Xuemeng Sun +6 more

Vision-language models (VLMs) rely on a shared visual-textual representation space to perform tasks such as zero-shot classification, image...

1 months ago cs.CV PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial