AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 61–80 of 969 papers

Clear filters

Attack HIGH

Poisoning Learned Index Structures: Static and Dynamic Adversarial Attacks on ALEX

Allen Jue

Learned index structures achieve high performance by modeling the cumulative distribution function (CDF) of keys, but this reliance on data...

2 weeks ago cs.CR cs.DB PDF

Attack MEDIUM

Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training

Mengnan Zhao, Lihe Zhang, Tianhang Zheng +2 more

Fast Adversarial Training (FAT) has attracted significant attention due to its efficiency in enhancing neural network robustness against adversarial...

2 weeks ago cs.LG cs.AI cs.CR PDF

Attack MEDIUM

Mitigating Error Amplification in Fast Adversarial Training

Mengnan Zhao, Lihe Zhang, Bo Wang +3 more

Fast Adversarial Training (FAT) has proven effective in enhancing model robustness by encouraging networks to learn perturbation-invariant...

2 weeks ago cs.LG cs.CR PDF

Attack HIGH

AgentVisor: Defending LLM Agents Against Prompt Injection via Semantic Virtualization

Zonghao Ying, Haozheng Wang, Jiangfan Liu +5 more

Large Language Model (LLM) agents are increasingly used to automate complex workflows, but integrating untrusted external data with privileged...

2 weeks ago cs.CR PDF

Attack HIGH

Jailbreaking Frontier Foundation Models Through Intention Deception

Xinhe Wang, Katia Sycara, Yaqi Xie

Large (vision-)language models exhibit remarkable capability but remain highly susceptible to jailbreaking. Existing safety training approaches aim...

2 weeks ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Spore: Efficient and Training-Free Privacy Extraction Attack on LLMs via Inference-Time Hybrid Probing

Yu Cui, Ruiqing Yue, Hang Fu +6 more

With the wide adoption of personal AI assistants such as OpenClaw, privacy leakage in user interaction contexts with large language model (LLM)...

2 weeks ago cs.CR PDF

Attack LOW

Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture

Rong Xiang

Recent evidence suggests that frontier AI systems can exhibit agentic misalignment, generating and executing harmful actions derived from internally...

2 weeks ago cs.AI cs.CR PDF

Attack HIGH

Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models

Naheed Rayhan, Sohely Jahan

Large language models (LLMs) are increasingly integrated into sensitive workflows, raising the stakes for adversarial robustness and safety. This...

2 weeks ago cs.CR cs.AI PDF

Attack HIGH

Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

Zihan Wang, Rui Zhang, Yu Liu +4 more

LLM agents increasingly rely on skills to encapsulate reusable capabilities via progressively disclosed instructions. High-quality skills inject...

2 weeks ago cs.CR PDF

Attack HIGH

Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers

Jiali Wei, Ming Fan, Guoheng Sun +3 more

The growing application of large language models (LLMs) in safety-critical domains has raised urgent concerns about their security. Many recent...

2 weeks ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Toward Efficient Membership Inference Attacks against Federated Large Language Models: A Projection Residual Approach

Guilin Deng, Silong Chen, Yuchuan Luo +6 more

Federated Large Language Models (FedLLMs) enable multiple parties to collaboratively fine-tune LLMs without sharing raw data, addressing challenges...

2 weeks ago cs.LG PDF

Attack HIGH

Adaptive Instruction Composition for Automated LLM Red-Teaming

Jesse Zymet, Andy Luo, Swapnil Shinde +2 more

Many approaches to LLM red-teaming leverage an attacker LLM to discover jailbreaks against a target. Several of them task the attacker with...

2 weeks ago cs.CR cs.AI cs.CL PDF

Attack MEDIUM

Dialect vs Demographics: Quantifying LLM Bias from Implicit Linguistic Signals vs. Explicit User Profiles

Irti Haq, Belén Saldías

As state-of-the-art Large Language Models (LLMs) have become ubiquitous, ensuring equitable performance across diverse demographics is critical....

2 weeks ago cs.CY cs.AI cs.CL PDF

Attack HIGH

Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models

Yannis Belkhiter, Giulio Zizzo, Sergio Maffeis +2 more

The growth of agentic AI has drawn significant attention to function calling Large Language Models (LLMs), which are designed to extend the...

2 weeks ago cs.CR cs.AI cs.CL PDF

Attack MEDIUM

Auto-ART: Structured Literature Synthesis and Automated Adversarial Robustness Testing

Abhijit Talluri

Adversarial robustness evaluation underpins every claim of trustworthy ML deployment, yet the field suffers from fragmented protocols and undetected...

2 weeks ago cs.CR cs.LG PDF

Attack HIGH

Towards Certified Malware Detection: Provable Guarantees Against Evasion Attacks

Nandakrishna Giri, Asmitha K. A., Serena Nicolazzo +2 more

Machine learning-based static malware detectors remain vulnerable to adversarial evasion techniques, such as metamorphic engine mutations. To address...

2 weeks ago cs.CR cs.LG PDF

Attack HIGH

Adaptive Defense Orchestration for RAG: A Sentinel-Strategist Architecture against Multi-Vector Attacks

Pranav Pallerla, Wilson Naik Bhukya, Bharath Vemula +1 more

Retrieval-augmented generation (RAG) systems are increasingly deployed in sensitive domains such as healthcare and law, where they rely on private,...

2 weeks ago cs.CR cs.AI PDF

Attack HIGH

STAR-Teaming: A Strategy-Response Multiplex Network Approach to Automated LLM Red Teaming

MinJae Jung, YongTaek Lim, Chaeyun Kim +3 more

While Large Language Models (LLMs) are widely used, they remain susceptible to jailbreak prompts that can elicit harmful or inappropriate responses....

3 weeks ago cs.CL PDF

Attack HIGH

An Empirical Study of Multi-Generation Sampling for Jailbreak Detection in Large Language Models

Hanrui Luo, Shreyank N Gowda

Detecting jailbreak behaviour in large language models remains challenging, particularly when strongly aligned models produce harmful outputs only...

3 weeks ago cs.CL cs.LG PDF

Attack MEDIUM

Beyond Indistinguishability: Measuring Extraction Risk in LLM APIs

Ruixuan Liu, David Evans, Li Xiong

Indistinguishability properties such as differential privacy bounds or low empirically measured membership inference are widely treated as proxies to...

3 weeks ago cs.CR cs.CL cs.LG PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial