AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 81–100 of 312 papers

Clear filters

Attack MEDIUM

From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration

Yizhe Xie, Congcong Zhu, Xinyue Zhang +5 more

Large Language Model-based Multi-Agent Systems (LLM-MAS) are increasingly applied to complex collaborative scenarios. However, their collaborative...

2 months ago cs.MA cs.AI PDF

Attack MEDIUM

Inherited Goal Drift: Contextual Pressure Can Undermine Agentic Goals

Achyutha Menon, Magnus Saebo, Tyler Crosse +3 more

The accelerating adoption of language models (LMs) as agents for deployment in long-context tasks motivates a thorough understanding of goal drift:...

2 months ago cs.AI PDF

Attack MEDIUM

Zero-Knowledge Federated Learning with Lattice-Based Hybrid Encryption for Quantum-Resilient Medical AI

Edouard Lansiaux

Federated Learning (FL) enables collaborative training of medical AI models across hospitals without centralizing patient data. However, the exchange...

2 months ago cs.CR cs.AI PDF

Attack MEDIUM

From Shallow to Deep: Pinning Semantic Intent via Causal GRPO

Shuyi Zhou, Zeen Song, Wenwen Qiang +4 more

Large Language Models remain vulnerable to adversarial prefix attacks (e.g., ``Sure, here is'') despite robust standard safety. We diagnose this...

2 months ago cs.LG PDF

Attack MEDIUM

Co-Evolutionary Multi-Modal Alignment via Structured Adversarial Evolution

Guoxin Shi, Haoyu Wang, Zaihui Yang +2 more

Adversarial behavior plays a central role in aligning large language models with human values. However, existing alignment methods largely rely on...

2 months ago cs.CR cs.AI PDF

Attack MEDIUM

Tracking Capabilities for Safer Agents

Martin Odersky, Yaoyu Zhao, Yichen Xu +2 more

AI agents that interact with the real world through tool calls pose fundamental safety challenges: agents might leak private information, cause...

2 months ago cs.AI cs.PL PDF

Attack MEDIUM

Silent Sabotage During Fine-Tuning: Few-Shot Rationale Poisoning of Compact Medical LLMs

Jingyuan Xie, Wenjie Wang, Ji Wu +1 more

Supervised fine-tuning (SFT) is essential for the development of medical large language models (LLMs), yet prior poisoning studies have mainly...

2 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

SwitchCraft: Training-Free Multi-Event Video Generation with Attention Controls

Qianxun Xu, Chenxi Song, Yujun Cai +1 more

Recent advances in text-to-video diffusion models have enabled high-fidelity and temporally coherent videos synthesis. However, current models are...

2 months ago cs.CV PDF

Attack MEDIUM

SwitchCraft: Training-Free Multi-Event Video Generation with Attention Controls

Qianxun Xu, Chenxi Song, Yujun Cai +1 more

Recent advances in text-to-video diffusion models have enabled high-fidelity and temporally coherent videos synthesis. However, current models are...

2 months ago cs.CV PDF

Attack MEDIUM

Adversarial Hubness Detector: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

Idan Habler, Vineeth Sai Narajala, Stav Koren +2 more

Retrieval-Augmented Generation (RAG) systems are essential to contemporary AI applications, allowing large language models to obtain external...

2 months ago cs.CR cs.AI PDF

Attack MEDIUM

Training Agents to Self-Report Misbehavior

Bruce W. Lee, Chen Yueh-Han, Tomek Korbak

Frontier AI agents may pursue hidden goals while concealing their pursuit from oversight. Alignment training aims to prevent such behavior by...

2 months ago cs.LG cs.AI PDF

Attack MEDIUM

Manifold of Failure: Behavioral Attraction Basins in Language Models

Sarthak Munshi, Manish Bhatt, Vineeth Sai Narajala +4 more

While prior work has focused on projecting adversarial examples back onto the manifold of natural data to restore safety, we argue that a...

2 months ago cs.LG cs.AI cs.CR PDF

Attack MEDIUM

Adversarial Intent is a Latent Variable: Stateful Trust Inference for Securing Multimodal Agentic RAG

Inderjeet Singh, Vikas Pahuja, Aishvariya Priya Rathina Sabapathy +8 more

Current stateless defences for multimodal agentic RAG fail to detect adversarial strategies that distribute malicious semantics across retrieval,...

2 months ago cs.CR cs.AI cs.CL PDF

Attack MEDIUM

The LLMbda Calculus: AI Agents, Conversations, and Information Flow

Zac Garby, Andrew D. Gordon, David Sands

A conversation with a large language model (LLM) is a sequence of prompts and responses, with each response generated from the preceding...

2 months ago cs.PL cs.AI cs.CR PDF

Attack MEDIUM

Agents of Chaos

Natalie Shapira, Chris Wendler, Avery Yen +35 more

We report an exploratory red-teaming study of autonomous language-model-powered agents deployed in a live laboratory environment with persistent...

2 months ago cs.AI cs.CY PDF

Attack MEDIUM

vLLM Semantic Router: Signal Driven Decision Routing for Mixture-of-Modality Models

Xunzhuo Liu, Huamin Chen, Samzong Lu +27 more

As large language models (LLMs) diversify across modalities, capabilities, and cost profiles, the problem of intelligent request routing -- selecting...

2 months ago cs.NI cs.AI PDF

Attack MEDIUM

Efficient Multi-Party Secure Comparison over Different Domains with Preprocessing Assistance

Kaiwen Wang, Xiaolin Chang, Yuehan Dong +1 more

Secure comparison is a fundamental primitive in multi-party computation, supporting privacy-preserving applications such as machine learning and data...

2 months ago cs.CR PDF

Attack MEDIUM

AndroWasm: an Empirical Study on Android Malware Obfuscation through WebAssembly

Diego Soi, Silvia Lucia Sanna, Lorenzo Pisu +2 more

In recent years, stealthy Android malware has increasingly adopted sophisticated techniques to bypass automatic detection mechanisms and harden...

2 months ago cs.CR PDF

Attack MEDIUM

DeepContext: Stateful Real-Time Detection of Multi-Turn Adversarial Intent Drift in LLMs

Justin Albrethsen, Yash Datta, Kunal Kumar +1 more

While Large Language Model (LLM) capabilities have scaled, safety guardrails remain largely stateless, treating multi-turn dialogues as a series of...

2 months ago cs.AI cs.ET cs.LG PDF

Attack MEDIUM

Policy Compiler for Secure Agentic Systems

Nils Palumbo, Sarthak Choudhary, Jihye Choi +2 more

LLM-based agents are increasingly being deployed in contexts requiring complex authorization policies: customer service protocols, approval...

2 months ago cs.CR cs.AI cs.MA PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial