AI Security Research

2,583+ academic papers on AI security, attacks, and defenses

Total

2,583

Attack

994

Benchmark

740

Defense

355

Tool

275

Survey

146

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 201–220 of 655 papers

Clear filters

Tool HIGH

ClawWorm: Self-Propagating Attacks Across LLM Agent Ecosystems

Yihao Zhang, Zeming Wei, Xiaokun Luan +7 more

Autonomous LLM-based agents increasingly operate as long-running processes forming densely interconnected multi-agent ecosystems, whose security...

1 months ago cs.CR cs.AI cs.LG PDF

Tool HIGH

ClawWorm: Self-Propagating Attacks Across LLM Agent Ecosystems

Yihao Zhang, Zeming Wei, Xiaokun Luan +7 more

Autonomous LLM-based agents increasingly operate as long-running processes forming densely interconnected multi-agent ecosystems, whose security...

1 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition

Mateusz Dziemian, Maxwell Lin, Xiaohan Fu +28 more

LLM based agents are increasingly deployed in high stakes settings where they process external data sources such as emails, documents, and code...

2 months ago cs.CR cs.AI PDF

Attack HIGH

From Storage to Steering: Memory Control Flow Attacks on LLM Agents

Zhenlin Xu, Xiaogang Zhu, Yu Yao +2 more

Modern agentic systems allow Large Language Model (LLM) agents to tackle complex tasks through extensive tool usage, forming structured control flows...

2 months ago cs.CR PDF

Benchmark HIGH

When Scanners Lie: Evaluator Instability in LLM Red-Teaming

Lidor Erez, Omer Hofman, Tamir Nizri +1 more

Automated LLM vulnerability scanners are increasingly used to assess security risks by measuring different attack type success rates (ASR). Yet the...

2 months ago cs.CR cs.PF PDF

Attack HIGH

Activation Surgery: Jailbreaking White-box LLMs without Touching the Prompt

Maël Jenny, Jérémie Dentan, Sonia Vanier +1 more

Most jailbreak techniques for Large Language Models (LLMs) primarily rely on prompt modifications, including paraphrasing, obfuscation, or...

2 months ago cs.CR PDF

Attack HIGH

Safety-Potential Pruning for Enhancing Safety Prompts Against VLM Jailbreaking Without Retraining

Chongxin Li, Hanzhang Wang, Lian Duan

Safety prompts constitute an interpretable layer of defense against jailbreak attacks in vision-language models (VLMs); however, their efficacy is...

2 months ago cs.CV PDF

Attack HIGH

GroupGuard: A Framework for Modeling and Defending Collusive Attacks in Multi-Agent Systems

Yiling Tao, Xinran Zheng, Shuo Yang +2 more

While large language model-based agents demonstrate great potential in collaborative tasks, their interactivity also introduces security...

2 months ago cs.AI PDF

Attack HIGH

Sirens' Whisper: Inaudible Near-Ultrasonic Jailbreaks of Speech-Driven LLMs

Zijian Ling, Pingyi Hu, Xiuyong Gao +6 more

Speech-driven large language models (LLMs) are increasingly accessed through speech interfaces, introducing new security risks via open acoustic...

2 months ago cs.CR cs.AI cs.SD PDF

Attack HIGH

PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses

Chenlong Yin, Runpeng Geng, Yanting Wang +1 more

Prompt injection poses serious security risks to real-world LLM applications, particularly autonomous agents. Although many defenses have been...

2 months ago cs.LG cs.CR PDF

Attack HIGH

SLICE: Semantic Latent Injection via Compartmentalized Embedding for Image Watermarking

Zheng Gao, Yifan Yang, Xiaoyu Li +4 more

Watermarking the initial noise of diffusion models has emerged as a promising approach for image provenance, but content-independent noise patterns...

2 months ago cs.CV cs.CR cs.LG PDF

Attack HIGH

Colluding LoRA: A Composite Attack on LLM Safety Alignment

Sihao Ding

We introduce Colluding LoRA (CoLoRA), an attack in which each adapter appears benign and plausibly functional in isolation, yet their linear...

2 months ago cs.CR cs.LG PDF

Attack HIGH

Agent Privilege Separation in OpenClaw: A Structural Defense Against Prompt Injection

Darren Cheng, Wen-Kwang Tsao

Prompt injection remains one of the most practical attack vectors against LLM-integrated applications. We replicate the Microsoft LLMail-Inject...

2 months ago cs.CR cs.AI PDF

Benchmark HIGH

Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies

Siddharth Srikanth, Freddie Liang, Sophie Hsu +9 more

Vision-Language-Action (VLA) models have significant potential to enable general-purpose robotic systems for a range of vision-language tasks....

2 months ago cs.RO cs.AI cs.CL PDF

Attack HIGH

Accelerating Suffix Jailbreak attacks with Prefix-Shared KV-cache

Xinhai Wang, Shaopeng Fu, Shu Yang +3 more

Suffix jailbreak attacks serve as a systematic method for red-teaming Large Language Models (LLMs) but suffer from prohibitive computational costs,...

2 months ago cs.CR cs.AI PDF

Attack HIGH

SpectralGuard: Detecting Memory Collapse Attacks in State Space Models

Davi Bonetto

State Space Models (SSMs) such as Mamba achieve linear-time sequence processing through input-dependent recurrence, but this mechanism introduces a...

2 months ago cs.LG cs.CR PDF

Attack HIGH

CLASP: Defending Hybrid Large Language Models Against Hidden State Poisoning Attacks

Alexandre Le Mercier, Thomas Demeester, Chris Develder

State space models (SSMs) like Mamba have gained significant traction as efficient alternatives to Transformers, achieving linear complexity while...

2 months ago cs.CL PDF

Tool HIGH

Cascade: Composing Software-Hardware Attack Gadgets for Adversarial Threat Amplification in Compound AI Systems

Sarbartha Banerjee, Prateek Sahu, Anjo Vahldiek-Oberwagner +2 more

Rapid progress in generative AI has given rise to Compound AI systems - pipelines comprised of multiple large language models (LLM), software tools...

2 months ago cs.CR cs.AI PDF

Attack HIGH

The Mirror Design Pattern: Strict Data Geometry over Model Scale for Prompt Injection Detection

J Alex Corll

Prompt injection defenses are often framed as semantic understanding problems and delegated to increasingly large neural detectors. For the first...

2 months ago cs.CR cs.AI PDF

Attack HIGH

Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover

Indranil Halder, Annesya Banerjee, Cengiz Pehlevan

Adversarial attacks can reliably steer safety-aligned large language models toward unsafe behavior. Empirically, we find that adversarial...

2 months ago cs.LG cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial