AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 81–100 of 625 papers

Clear filters

Attack HIGH

Do No Harm: Exposing Hidden Vulnerabilities of LLMs via Persona-based Client Simulation Attack in Psychological Counseling

Qingyang Xu, Yaling Shen, Stephanie Fong +7 more

The increasing use of large language models (LLMs) in mental healthcare raises safety concerns in high-stakes therapeutic interactions. A key...

1 months ago cs.CL PDF

Attack HIGH

CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks

Siyuan Li, Zehao Liu, Xi Lin +6 more

As Large Language Models (LLMs) are increasingly deployed in complex applications, their vulnerability to adversarial attacks raises urgent safety...

1 months ago cs.CR cs.AI PDF

Attack HIGH

SelfGrader: Stable Jailbreak Detection for Large Language Models using Token-Level Logits

Zikai Zhang, Rui Hu, Olivera Kotevska +1 more

Large Language Models (LLMs) are powerful tools for answering user queries, yet they remain highly vulnerable to jailbreak attacks. Existing...

1 months ago cs.CR cs.AI PDF

Attack HIGH

No Attacker Needed: Unintentional Cross-User Contamination in Shared-State LLM Agents

Tiankai Yang, Jiate Li, Yi Nian +5 more

LLM-based agents increasingly operate across repeated sessions, maintaining task states to ensure continuity. In many deployments, a single agent...

1 months ago cs.CL cs.AI cs.CR PDF

Attack HIGH

AgentWatcher: A Rule-based Prompt Injection Monitor

Yanting Wang, Wei Zou, Runpeng Geng +1 more

Large language models (LLMs) and their applications, such as agents, are highly vulnerable to prompt injection attacks. State-of-the-art prompt...

1 months ago cs.CR PDF

Attack HIGH

When Safe Models Merge into Danger: Exploiting Latent Vulnerabilities in LLM Fusion

Jiaqing Li, Zhibo Zhang, Shide Zhou +3 more

Model merging has emerged as a powerful technique for combining specialized capabilities from multiple fine-tuned LLMs without additional training...

1 months ago cs.CR PDF

Attack HIGH

Adversarial Prompt Injection Attack on Multimodal Large Language Models

Meiwen Ding, Song Xia, Chenqi Kong +1 more

Although multimodal large language models (MLLMs) are increasingly deployed in real-world applications, their instruction-following behavior leaves...

1 months ago cs.CV cs.AI PDF

Attack HIGH

Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning

Kavindu Herath, Joshua Zhao, Saurabh Bagchi

Backdoor attacks on federated learning (FL) are most often evaluated with synthetic corner patches or out-of-distribution (OOD) patterns that are...

1 months ago cs.CR cs.AI cs.CV PDF

Attack HIGH

Dummy-Aware Weighted Attack (DAWA): Breaking the Safe Sink in Dummy Class Defenses

Yunrui Yu, Xuxiang Feng, Pengda Qin +5 more

Adversarial robustness evaluation faces a critical challenge as new defense paradigms emerge that can exploit limitations in existing assessment...

1 months ago cs.LG cs.CR PDF

Attack HIGH

Trojan-Speak: Bypassing Constitutional Classifiers with No Jailbreak Tax via Adversarial Finetuning

Bilgehan Sel, Xuanli He, Alwin Peng +2 more

Fine-tuning APIs offered by major AI providers create new attack surfaces where adversaries can bypass safety measures through targeted fine-tuning....

1 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

\texttt{ReproMIA}: A Comprehensive Analysis of Model Reprogramming for Proactive Membership Inference Attacks

Chihan Huang, Huaijin Wang, Shuai Wang

The pervasive deployment of deep learning models across critical domains has concurrently intensified privacy concerns due to their inherent...

1 months ago cs.LG cs.CR PDF

Attack HIGH

XSPA: Crafting Imperceptible X-Shaped Sparse Adversarial Perturbations for Transferable Attacks on VLMs

Chengyin Hu, Jiaju Han, Xuemeng Sun +6 more

Vision-language models (VLMs) rely on a shared visual-textual representation space to perform tasks such as zero-shot classification, image...

1 months ago cs.CV PDF

Attack HIGH

Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers

Haochuan Kevin Wang

We present a stage-decomposed analysis of prompt injection attacks against five frontier LLM agents. Prior work measures task-level attack success...

1 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

Shape and Substance: Dual-Layer Side-Channel Attacks on Local Vision-Language Models

Eyal Hadad, Mordechai Guri

On-device Vision-Language Models (VLMs) promise data privacy via local execution. However, we show that the architectural shift toward Dynamic...

1 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

On the Vulnerability of Deep Automatic Modulation Classifiers to Explainable Backdoor Threats

Younes Salmi, Hanna Bogucka

Deep learning (DL) has been widely studied for assisting applications of modern wireless communications. One of the applications is automatic...

1 months ago cs.CR PDF

Attack HIGH

Physical Backdoor Attack Against Deep Learning-Based Modulation Classification

Younes Salmi, Hanna Bogucka

Deep Learning (DL) has become a key technology that assists radio frequency (RF) signal classification applications, such as modulation...

1 months ago cs.CR PDF

Attack HIGH

Mitigating Evasion Attacks in Fog Computing Resource Provisioning Through Proactive Hardening

Younes Salmi, Hanna Bogucka

This paper investigates the susceptibility to model integrity attacks that overload virtual machines assigned by the k-means algorithm used for...

1 months ago cs.CR cs.LG PDF

Attack HIGH

Prompt Attack Detection with LLM-as-a-Judge and Mixture-of-Models

Hieu Xuan Le, Benjamin Goh, Quy Anh Tang

Prompt attacks, including jailbreaks and prompt injections, pose a critical security risk to Large Language Model (LLM) systems. In production,...

1 months ago cs.CL PDF

Attack HIGH

PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Generation Systems

Haozhen Wang, Haoyue Liu, Jionghao Zhu +3 more

Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of applications. However, their practical deployment is...

1 months ago cs.CR cs.AI PDF

Attack HIGH

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

Alexander Panfilov, Peter Romov, Igor Shilov +3 more

LLM agents like Claude Code can not only write code but also be used for autonomous AI research and engineering \citep{rank2026posttrainbench,...

1 months ago cs.LG cs.AI cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial