AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1121–1140 of 1,175 papers

Clear filters

Attack HIGH

Fine-Tuning Jailbreaks under Highly Constrained Black-Box Settings: A Three-Pronged Approach

Xiangfang Li, Yu Wang, Bo Li

With the rapid advancement of large language models (LLMs), ensuring their safe use becomes increasingly critical. Fine-tuning is a widely used...

8 months ago cs.CR PDF

Attack HIGH

Backdoor Attacks Against Speech Language Models

Alexandrine Fortier, Thomas Thebaud, Jesús Villalba +2 more

Large Language Models (LLMs) and their multimodal extensions are becoming increasingly popular. One common approach to enable multimodality is to...

8 months ago cs.CL cs.CR cs.SD PDF

Attack MEDIUM

Eyes-on-Me: Scalable RAG Poisoning through Transferable Attention-Steering Attractors

Yen-Shan Chen, Sian-Yao Huang, Cheng-Lin Yang +1 more

Existing data poisoning attacks on retrieval-augmented generation (RAG) systems scale poorly because they require costly optimization of poisoned...

8 months ago cs.LG cs.CL cs.CR PDF

Attack HIGH

Attack logics, not outputs: Towards efficient robustification of deep neural networks by falsifying concept-based properties

Raik Dankworth, Gesina Schwalbe

Deep neural networks (NNs) for computer vision are vulnerable to adversarial attacks, i.e., miniscule malicious changes to inputs may induce...

8 months ago cs.CR cs.LG PDF

Attack MEDIUM

Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness

Tsubasa Takahashi, Shojiro Yamabe, Futa Waseda +1 more

Differential Attention (DA) has been proposed as a refinement to standard attention, suppressing redundant or noisy context through a subtractive...

8 months ago cs.LG cs.CR PDF

Attack MEDIUM

Has the Two-Decade-Old Prophecy Come True? Artificial Bad Intelligence Triggered by Merely a Single-Bit Flip in Large Language Models

Yu Yan, Siqi Lu, Yang Gao +4 more

Recently, Bit-Flip Attack (BFA) has garnered widespread attention for its ability to compromise software system integrity remotely through hardware...

8 months ago cs.CR PDF

Attack HIGH

SVDefense: Effective Defense against Gradient Inversion Attacks via Singular Value Decomposition

Chenxiang Luo, David K. Y. Yau, Qun Song

Federated learning (FL) enables collaborative model training without sharing raw data but is vulnerable to gradient inversion attacks (GIAs), where...

8 months ago cs.CR cs.LG PDF

Attack MEDIUM

A Call to Action for a Secure-by-Design Generative AI Paradigm

Dalal Alharthi, Ivan Roberto Kawaminami Garcia

Large language models have gained widespread prominence, yet their vulnerability to prompt injection and other adversarial attacks remains a critical...

8 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

MOLM: Mixture of LoRA Markers

Samar Fares, Nurbek Tastan, Noor Hussein +1 more

Generative models can generate photorealistic images at scale. This raises urgent concerns about the ability to detect synthetically generated images...

8 months ago cs.CV cs.CR cs.LG PDF

Attack MEDIUM

CHAI: Command Hijacking against embodied AI

Luis Burbano, Diego Ortiz, Qi Sun +5 more

Embodied Artificial Intelligence (AI) promises to handle edge cases in robotic vehicle systems where data is scarce by using common-sense reasoning...

9 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

Are Robust LLM Fingerprints Adversarially Robust?

Anshul Nasery, Edoardo Contente, Alkin Kaz +2 more

Model fingerprinting has emerged as a promising paradigm for claiming model ownership. However, robustness evaluations of these schemes have mostly...

9 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

DeepProv: Behavioral Characterization and Repair of Neural Networks via Inference Provenance Graph Analysis

Firas Ben Hmida, Abderrahmen Amich, Ata Kaboudi +1 more

Deep neural networks (DNNs) are increasingly being deployed in high-stakes applications, from self-driving cars to biometric authentication. However,...

9 months ago cs.CR cs.LG PDF

Attack LOW

Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents

Shuai Shao, Qihan Ren, Chen Qian +8 more

Advances in Large Language Models (LLMs) have enabled a new class of self-evolving agents that autonomously improve through interaction with the...

9 months ago cs.AI cs.CL cs.LG PDF

Attack HIGH

SafeBehavior: Simulating Human-Like Multistage Reasoning to Mitigate Jailbreak Attacks in Large Language Models

Qinjian Zhao, Jiaqi Wang, Zhiqiang Gao +3 more

Large Language Models (LLMs) have achieved impressive performance across diverse natural language processing tasks, but their growing power also...

9 months ago cs.AI PDF

Attack HIGH

Stealthy Yet Effective: Distribution-Preserving Backdoor Attacks on Graph Classification

Xiaobao Wang, Ruoxiao Sun, Yujun Zhang +4 more

Graph Neural Networks (GNNs) have demonstrated strong performance across tasks such as node classification, link prediction, and graph...

9 months ago cs.LG cs.CR PDF

Attack MEDIUM

The Impact of Scaling Training Data on Adversarial Robustness

Marco Zimmerli, Andreas Plesner, Till Aczel +1 more

Deep neural networks remain vulnerable to adversarial examples despite advances in architectures and training paradigms. We investigate how training...

9 months ago cs.CV cs.AI cs.CR PDF

Attack MEDIUM

Better Privilege Separation for Agents by Restricting Data Types

Dennis Jacob, Emad Alghamdi, Zhanhao Hu +2 more

Large language models (LLMs) have become increasingly popular due to their ability to interact with unstructured content. As such, LLMs are now a key...

9 months ago cs.CR cs.LG PDF

Attack HIGH

ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack

Yein Park, Jungwoo Park, Jaewoo Kang

Large language models (LLMs), despite being safety-aligned, exhibit brittle refusal behaviors that can be circumvented by simple linguistic changes....

9 months ago cs.AI PDF

Attack HIGH

Fingerprinting LLMs via Prompt Injection

Yuepeng Hu, Zhengyuan Jiang, Mengyuan Li +4 more

Large language models (LLMs) are often modified after release through post-processing such as post-training or quantization, which makes it...

9 months ago cs.CR cs.CL PDF

Attack LOW

Incentive-Aligned Multi-Source LLM Summaries

Yanchen Jiang, Zhe Feng, Aranyak Mehta

Large language models (LLMs) are increasingly used in modern search and answer systems to synthesize multiple, sometimes conflicting, texts into a...

9 months ago cs.CL cs.AI cs.GT PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial