AI Security Research

2,560+ academic papers on AI security, attacks, and defenses

Total

2,560

Attack

982

Benchmark

736

Defense

350

Tool

275

Survey

144

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 261–280 of 982 papers

Clear filters

Attack HIGH

DSBA: Dynamic Stealthy Backdoor Attack with Collaborative Optimization in Self-Supervised Learning

Jiayao Wang, Mohammad Maruf Hasan, Yiping Zhang +5 more

Self-Supervised Learning (SSL) has emerged as a significant paradigm in representation learning thanks to its ability to learn without extensive...

2 months ago cs.CR PDF

Attack MEDIUM

From Shallow to Deep: Pinning Semantic Intent via Causal GRPO

Shuyi Zhou, Zeen Song, Wenwen Qiang +4 more

Large Language Models remain vulnerable to adversarial prefix attacks (e.g., ``Sure, here is'') despite robust standard safety. We diagnose this...

2 months ago cs.LG PDF

Attack HIGH

Ignore All Previous Instructions: Jailbreaking as a de-escalatory peace building practise to resist LLM social media bots

Huw Day, Adrianna Jezierska, Jessica Woodgate

Large Language Models have intensified the scale and strategic manipulation of political discourse on social media, leading to conflict escalation....

2 months ago cs.HC cs.AI PDF

Attack MEDIUM

Co-Evolutionary Multi-Modal Alignment via Structured Adversarial Evolution

Guoxin Shi, Haoyu Wang, Zaihui Yang +2 more

Adversarial behavior plays a central role in aligning large language models with human values. However, existing alignment methods largely rely on...

2 months ago cs.CR cs.AI PDF

Attack HIGH

VidDoS: Universal Denial-of-Service Attack on Video-based Large Language Models

Duoxun Tang, Dasen Dai, Jiyao Wang +3 more

Video-LLMs are increasingly deployed in safety-critical applications but are vulnerable to Energy-Latency Attacks (ELAs) that exhaust computational...

2 months ago cs.CV cs.AI PDF

Attack HIGH

Jailbreaking Embodied LLMs via Action-level Manipulation

Xinyu Huang, Qiang Yang, Leming Shen +2 more

Embodied Large Language Models (LLMs) enable AI agents to interact with the physical world through natural language instructions and actions....

2 months ago cs.RO PDF

Attack HIGH

BadRSSD: Backdoor Attacks on Regularized Self-Supervised Diffusion Models

Jiayao Wang, Yiping Zhang, Mohammad Maruf Hasan +5 more

Self-supervised diffusion models learn high-quality visual representations via latent space denoising. However, their representation layer poses a...

2 months ago cs.CR cs.LG PDF

Attack MEDIUM

Tracking Capabilities for Safer Agents

Martin Odersky, Yaoyu Zhao, Yichen Xu +2 more

AI agents that interact with the real world through tool calls pose fundamental safety challenges: agents might leak private information, cause...

2 months ago cs.AI cs.PL PDF

Attack HIGH

AMDS: Attack-Aware Multi-Stage Defense System for Network Intrusion Detection with Two-Stage Adaptive Weight Learning

Oluseyi Olukola, Nick Rahimi

Machine learning based network intrusion detection systems are vulnerable to adversarial attacks that degrade classification performance under both...

2 months ago cs.CR cs.AI PDF

Attack HIGH

IU: Imperceptible Universal Backdoor Attack

Hsin Lin, Yan-Lun Chen, Ren-Hung Hwang +1 more

Backdoor attacks pose a critical threat to the security of deep neural networks, yet existing efforts on universal backdoors often rely on visually...

2 months ago cs.CR cs.CV cs.LG PDF

Attack HIGH

MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs

Yilian Liu, Xiaojun Jia, Guoshun Nan +6 more

Multimodal Large Language Models (MLLMs) have achieved remarkable performance but remain vulnerable to jailbreak attacks that can induce harmful...

2 months ago cs.CV cs.AI cs.CR PDF

Attack HIGH

CaptionFool: Universal Image Captioning Model Attacks

Swapnil Parekh

Image captioning models are encoder-decoder architectures trained on large-scale image-text datasets, making them susceptible to adversarial attacks....

2 months ago cs.CV cs.AI PDF

Attack MEDIUM

Silent Sabotage During Fine-Tuning: Few-Shot Rationale Poisoning of Compact Medical LLMs

Jingyuan Xie, Wenjie Wang, Ji Wu +1 more

Supervised fine-tuning (SFT) is essential for the development of medical large language models (LLMs), yet prior poisoning studies have mainly...

2 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

Atomicity for Agents: Exposing, Exploiting, and Mitigating TOCTOU Vulnerabilities in Browser-Use Agents

Linxi Jiang, Zhijie Liu, Haotian Luo +1 more

Browser-use agents are widely used for everyday tasks. They enable automated interaction with web pages through structured DOM based interfaces or...

2 months ago cs.CR cs.AI PDF

Attack MEDIUM

SwitchCraft: Training-Free Multi-Event Video Generation with Attention Controls

Qianxun Xu, Chenxi Song, Yujun Cai +1 more

Recent advances in text-to-video diffusion models have enabled high-fidelity and temporally coherent videos synthesis. However, current models are...

2 months ago cs.CV PDF

Attack MEDIUM

SwitchCraft: Training-Free Multi-Event Video Generation with Attention Controls

Qianxun Xu, Chenxi Song, Yujun Cai +1 more

Recent advances in text-to-video diffusion models have enabled high-fidelity and temporally coherent videos synthesis. However, current models are...

2 months ago cs.CV PDF

Attack HIGH

Hidden in the Metadata: Stealth Poisoning Attacks on Multimodal Retrieval-Augmented Generation

Kennedy Edemacu, Mohammad Mahdi Shokri

Retrieval-augmented generation (RAG) has emerged as a powerful paradigm for enhancing multimodal large language models by grounding their responses...

2 months ago cs.CR cs.AI PDF

Attack HIGH

Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search

Xun Huang, Simeng Qin, Xiaoshuang Jia +6 more

As Large Language Models (LLMs) are increasingly used, their security risks have drawn increasing attention. Existing research reveals that LLMs are...

2 months ago cs.AI cs.CR PDF

Attack HIGH

AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification

Tian Zhang, Yiwei Xu, Juan Wang +8 more

Large language model (LLM) agents increasingly rely on external tools and retrieval systems to autonomously complete complex tasks. However, this...

2 months ago cs.CR cs.AI PDF

Attack HIGH

Reverse CAPTCHA: Evaluating LLM Susceptibility to Invisible Unicode Instruction Injection

Marcus Graves

We introduce Reverse CAPTCHA, an evaluation framework that tests whether large language models follow invisible Unicode-encoded instructions embedded...

2 months ago cs.CR cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial