AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 301–320 of 1,983 papers

Clear filters

Benchmark MEDIUM

Cybersecurity AI (CAI) Dataset

Víctor Mayoral-Vilches

We present CAI Dataset, a fourteen-month corpus of cybersecurity LLM trajectories collected through the open-source CAI agent framework, built in...

1 months ago cs.CR PDF

Benchmark MEDIUM

Towards Demystifying and Repairing LLM-in-the-Loop Vulnerabilities

Yujie Ma, Jialin Rong, Chenxi Yang +4 more

Large Language Models(LLMs) have been actively integrated into modern software systems as critical components. LLM-in-the-loop vulnerabilities, where...

1 months ago cs.SE cs.CR PDF

Attack HIGH

MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents via User-Generated Content

Ruoqi Guo, Yi Liu, Gelei Deng +7 more

Mobile graphical user interface (GUI) agents driven by vision-language models (VLMs) perceive the screen as rendered pixels and choose actions from...

1 months ago cs.CR cs.AI cs.CL PDF

Attack MEDIUM

A Wolf in Sheep's Clothing: Targeted Routing Hijacking in Federated RAG

Junjie Mu, Qiongxiu Li

Federated Retrieval-Augmented Generation (FedRAG) is attractive for privacy-sensitive applications because raw data remain local. As a result,...

1 months ago cs.CR cs.CL cs.IR PDF

Attack MEDIUM

SilentRetrieval: Hijacking Retrieval-Augmented Generation via Semantically-Preserving Adversarial Data Poisoning

Jiachen Qian

Retrieval-Augmented Generation (RAG) mitigates LLM hallucinations but introduces a critical vulnerability: corpus integrity. We present...

1 months ago cs.CR cs.CL cs.IR PDF

Attack HIGH

Can It Reach the Generator? Investigating the Survival of Prompt-Injection Attacks in Realistic RAG Settings

Yu Yin, Shuai Wang, Bevan Koopman +1 more

Recent generative engine optimisation (GEO) research has shown that prompt-injection attacks can push a target product to the top of an LLM's...

1 months ago cs.CR cs.IR PDF

Benchmark MEDIUM

KSAFE-MM: A Multimodal Safety Benchmark via Localized Contextualization for Korean Cultural Risks

Yongwoo Kim, Sojung An, Yunjin Park +8 more

Multimodal Large Language Models (MLLMs) exacerbate safety risks by introducing vulnerabilities across multiple modalities, such as language and...

1 months ago cs.CL PDF

Attack HIGH

When Think-with-Image Meets Safety: What Determines Multimodal Jailbreak Robustness?

Yuan Tian, Bing Hu, Fang Wu +3 more

Think-with-image reasoning is emerging as a new inference paradigm for large vision-language models, but its safety implications remain poorly...

1 months ago cs.CV cs.AI cs.CL PDF

Benchmark LOW

FinBoardBench: Benchmarking Dynamic Wealth Management and Strategic Financial Reasoning of LLMs via Board Game Simulations

Xuesi Hu, Peng Wang, Jinpeng Miao +7 more

Recently, large language models (LLMs) have achieved superior performance in static financial reasoning and simple dynamic trading tasks. However,...

1 months ago cs.CL cs.CE PDF

Attack MEDIUM

Disentangling Adversarial Prompts: A Semantic-Graph Defense for Robust LLM Security

Xiang Fang, Wanlong Fang

Large Language Models (LLMs) are increasingly vulnerable to adversarial prompts that exploit semantic ambiguities to bypass safety mechanisms,...

1 months ago cs.CR cs.AI cs.CV PDF

Attack HIGH

Density-aware Sample-specific Attack

Qiyuan Wang, Yao Li, Raymond K. W. Wong

Despite recent progress in backdoor attacks, existing methods remain susceptible to post-training defenses that erase the backdoor through...

1 months ago cs.LG cs.CR PDF

Benchmark LOW

SYNAPSE: Neuro-Symbolic Visual Thought-to-Text Decoding via Topological Semantic Denoising

Akshaj Murhekar, Abhijit Mishra

Recent advances in large language models have accelerated open-vocabulary EEG-to-imagined-text decoding, where non-invasive neural activity recorded...

1 months ago cs.LG PDF

Tool LOW

Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems

Aman Priyanshu, Supriti Vijay, Esha Pahwa

LLM safety evaluations predominantly test models in isolation, yet deployed AI agents increasingly operate within persistent social environments...

1 months ago cs.AI PDF

Benchmark MEDIUM

Escape the Language Prior: Mitigating Late-Stage Modality Collapse in Audio Reasoning via Modality-Aware Policy Optimization

Cihan Xiao, Yiwen Shao, Chenxing Li +5 more

Audio and omni-modal large language models exhibit impressive cross-modal reasoning capabilities. However, applying standard reinforcement learning...

1 months ago cs.CL PDF

Attack MEDIUM

Cross-Entropy Games and Frost Training

Arthur Renard, Franck Gabriel, Valentin Hartmann +1 more

We present Frost Training, a method for improving Monte Carlo-based policy optimization for a large family of LLM-as-a-judge tasks called...

1 months ago cs.AI PDF

Attack HIGH

Backdoor Attacks on Fault Detection and Localization in Cyber-Physical Systems

Abile Jean, Kuniyilh S

Cyber-Physical Systems (CPS) integrate sensing, communication, computation, and control to support critical infrastructure, including smart grids,...

1 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

Poison with Style: A Practical Poisoning Attack on Code Large Language Models

Khang Tran, Yazan Boshmaf, Issa Khalil +3 more

Code Large Language Models (CLLMs) serve as the core of modern code agents, enabling developers to automate complex software development tasks. In...

1 months ago cs.CR cs.LG PDF

Attack HIGH

PAST2HARM: A Simple Adaptive Past Tense Attack for Jailbreaking Multimodal AI

Snehasis Mukhopadhyay

Jailbreak attacks on multimodal AI systems remain underexplored, even though unsafe image generation can have more severe consequences than unsafe...

1 months ago cs.CL PDF

Other LOW

GENESIS: Harnessing AI Agents for Autonomous 6G RAN Synthesis, Research, and Testing

Tamerlan Aghayev, Maxime Elkael, Michele Polese +11 more

Cellular research and development (R&D) is throttled by six structural processes that each consume months of manual engineering work per iteration:...

1 months ago cs.NI cs.AI PDF

Attack HIGH

Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases

Dongyoon Hahm, Dylan Hadfield-Menell, Kimin Lee

Reinforcement Learning from Human Feedback (RLHF) is the standard method to align Large Language Models (LLMs) with human preferences. In this work,...

1 months ago cs.AI cs.CL cs.LG PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial