AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 2861–2880 of 3,023 papers

Attack HIGH

Untargeted Jailbreak Attack

Xinzhe Huang, Wenjing Hu, Tianhang Zheng +5 more

Existing gradient-based jailbreak attacks on Large Language Models (LLMs) typically optimize adversarial suffixes to align the LLM output with...

8 months ago cs.CR cs.AI PDF

Attack HIGH

External Data Extraction Attacks against Retrieval-Augmented Large Language Models

Yu He, Yifei Chen, Yiming Li +5 more

In recent years, RAG has emerged as a key paradigm for enhancing large language models (LLMs). By integrating externally retrieved information, RAG...

8 months ago cs.CR PDF

Benchmark MEDIUM

Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain

Léo Boisvert, Abhay Puri, Chandra Kiran Reddy Evuru +6 more

While finetuning AI agents on interaction data -- such as web browsing or tool use -- improves their capabilities, it also introduces critical...

8 months ago cs.CR cs.AI cs.LG PDF

Benchmark MEDIUM

Zero-Shot Robustness of Vision Language Models Via Confidence-Aware Weighting

Nikoo Naghavian, Mostafa Tavassolipour

Vision-language models like CLIP demonstrate impressive zero-shot generalization but remain highly vulnerable to adversarial attacks. In this work,...

8 months ago cs.CV PDF

Attack HIGH

Attack via Overfitting: 10-shot Benign Fine-tuning to Jailbreak LLMs

Zhixin Xie, Xurui Song, Jun Luo

Despite substantial efforts in safety alignment, recent research indicates that Large Language Models (LLMs) remain highly susceptible to jailbreak...

8 months ago cs.CR PDF

Attack MEDIUM

Adversarial Reinforcement Learning for Offensive and Defensive Agents in a Simulated Zero-Sum Network Environment

Abrar Shahid, Ibteeker Mahir Ishum, AKM Tahmidul Haque +2 more

This paper presents a controlled study of adversarial reinforcement learning in network security through a custom OpenAI Gym environment that models...

8 months ago cs.LG cs.AI cs.CR PDF

Defense MEDIUM

VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation

Lesly Miculicich, Mihir Parmar, Hamid Palangi +4 more

The deployment of autonomous AI agents in sensitive domains, such as healthcare, introduces critical risks to safety, security, and privacy. These...

8 months ago cs.SE cs.AI cs.CR PDF

Attack HIGH

A Statistical Method for Attack-Agnostic Adversarial Attack Detection with Compressive Sensing Comparison

Chinthana Wimalasuriya, Spyros Tragoudas

Adversarial attacks present a significant threat to modern machine learning systems. Yet, existing detection methods often lack the ability to detect...

8 months ago cs.CR cs.CV cs.LG PDF

Tool MEDIUM

MALF: A Multi-Agent LLM Framework for Intelligent Fuzzing of Industrial Control Protocols

Bowei Ning, Xuejun Zong, Kan He

Industrial control systems (ICS) are vital to modern infrastructure but increasingly vulnerable to cybersecurity threats, particularly through...

8 months ago cs.CR PDF

Attack HIGH

ARMs: Adaptive Red-Teaming Agent against Multimodal Models with Plug-and-Play Attacks

Zhaorun Chen, Xun Liu, Mintong Kang +4 more

As vision-language models (VLMs) gain prominence, their multimodal interfaces also introduce new safety vulnerabilities, making the safety evaluation...

8 months ago cs.AI cs.LG PDF

Benchmark HIGH

RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents

Chengquan Guo, Chulin Xie, Yu Yang +6 more

Code agents have gained widespread adoption due to their strong code generation capabilities and integration with code interpreters, enabling dynamic...

8 months ago cs.SE PDF

Benchmark MEDIUM

Who's Wearing? Ear Canal Biometric Key Extraction for User Authentication on Wireless Earbuds

Chenpei Huang, Lingfeng Yao, Hui Zhong +5 more

Ear canal scanning/sensing (ECS) has emerged as a novel biometric authentication method for mobile devices paired with wireless earbuds. Existing...

8 months ago cs.CR cs.HC PDF

Tool HIGH

ToolTweak: An Attack on Tool Selection in LLM-based Agents

Jonathan Sneh, Ruomei Yan, Jialin Yu +6 more

As LLMs increasingly power agents that interact with external tools, tool use has become an essential mechanism for extending their capabilities....

8 months ago cs.CR cs.AI PDF

Attack HIGH

Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks

Ruohao Guo, Afshin Oroojlooy, Roshan Sridhar +3 more

Despite recent rapid progress in AI safety, current large language models remain vulnerable to adversarial attacks in multi-turn interaction...

8 months ago cs.LG cs.AI cs.CL PDF

Defense MEDIUM

UpSafe$^\circ$C: Upcycling for Controllable Safety in Large Language Models

Yuhao Sun, Zhuoer Xu, Shiwen Cui +4 more

Large Language Models (LLMs) have achieved remarkable progress across a wide range of tasks, but remain vulnerable to safety risks such as harmful...

8 months ago cs.AI cs.CR cs.LG PDF

Attack HIGH

Dynamic Target Attack

Kedong Xiu, Churui Zeng, Tianhang Zheng +6 more

Existing gradient-based jailbreak attacks typically optimize an adversarial suffix to induce a fixed affirmative response, e.g., ``Sure, here...

8 months ago cs.CR cs.AI PDF

Benchmark LOW

FalseCrashReducer: Mitigating False Positive Crashes in OSS-Fuzz-Gen Using Agentic AI

Paschal C. Amusuo, Dongge Liu, Ricardo Andres Calvo Mendez +3 more

Fuzz testing has become a cornerstone technique for identifying software bugs and security vulnerabilities, with broad adoption in both industry and...

8 months ago cs.SE cs.CR cs.MA PDF

Attack MEDIUM

Inverse Language Modeling towards Robust and Grounded LLMs

Davide Gabrielli, Simone Sestito, Iacopo Masi

The current landscape of defensive mechanisms for LLMs is fragmented and underdeveloped, unlike prior work on classifiers. To further promote...

8 months ago cs.CL PDF

Benchmark MEDIUM

Are LLMs Better GNN Helpers? Rethinking Robust Graph Learning under Deficiencies with Iterative Refinement

Zhaoyan Wang, Zheng Gao, Arogya Kharel +1 more

Graph Neural Networks (GNNs) are widely adopted in Web-related applications, serving as a core technique for learning from graph-structured data,...

8 months ago cs.LG cs.AI PDF

Benchmark LOW

Human-AI Teaming Co-Learning in Military Operations

Clara Maathuis, Kasper Cools

In a time of rapidly evolving military threats and increasingly complex operational environments, the integration of AI into military operations...

8 months ago cs.AI PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial