AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 161–180 of 950 papers

Clear filters

Benchmark MEDIUM

Towards Demystifying and Repairing LLM-in-the-Loop Vulnerabilities

Yujie Ma, Jialin Rong, Chenxi Yang +4 more

Large Language Models(LLMs) have been actively integrated into modern software systems as critical components. LLM-in-the-loop vulnerabilities, where...

1 months ago cs.SE cs.CR PDF

Attack MEDIUM

A Wolf in Sheep's Clothing: Targeted Routing Hijacking in Federated RAG

Junjie Mu, Qiongxiu Li

Federated Retrieval-Augmented Generation (FedRAG) is attractive for privacy-sensitive applications because raw data remain local. As a result,...

1 months ago cs.CR cs.CL cs.IR PDF

Attack MEDIUM

SilentRetrieval: Hijacking Retrieval-Augmented Generation via Semantically-Preserving Adversarial Data Poisoning

Jiachen Qian

Retrieval-Augmented Generation (RAG) mitigates LLM hallucinations but introduces a critical vulnerability: corpus integrity. We present...

1 months ago cs.CR cs.CL cs.IR PDF

Benchmark MEDIUM

KSAFE-MM: A Multimodal Safety Benchmark via Localized Contextualization for Korean Cultural Risks

Yongwoo Kim, Sojung An, Yunjin Park +8 more

Multimodal Large Language Models (MLLMs) exacerbate safety risks by introducing vulnerabilities across multiple modalities, such as language and...

1 months ago cs.CL PDF

Attack MEDIUM

Disentangling Adversarial Prompts: A Semantic-Graph Defense for Robust LLM Security

Xiang Fang, Wanlong Fang

Large Language Models (LLMs) are increasingly vulnerable to adversarial prompts that exploit semantic ambiguities to bypass safety mechanisms,...

1 months ago cs.CR cs.AI cs.CV PDF

Benchmark MEDIUM

Escape the Language Prior: Mitigating Late-Stage Modality Collapse in Audio Reasoning via Modality-Aware Policy Optimization

Cihan Xiao, Yiwen Shao, Chenxing Li +5 more

Audio and omni-modal large language models exhibit impressive cross-modal reasoning capabilities. However, applying standard reinforcement learning...

1 months ago cs.CL PDF

Attack MEDIUM

Cross-Entropy Games and Frost Training

Arthur Renard, Franck Gabriel, Valentin Hartmann +1 more

We present Frost Training, a method for improving Monte Carlo-based policy optimization for a large family of LLM-as-a-judge tasks called...

1 months ago cs.AI PDF

Benchmark MEDIUM

Grounded Cache Routing for Retrieval-Augmented Generation: When Is It Safe to Reuse an Answer?

Syed Huma Shah

Modern retrieval-augmented generation(RAG) deployments increasingly rely on caching to reduce token cost and time-to-first-token(TTFT). Prefix-level...

1 months ago cs.CR cs.AI cs.CL PDF

Tool MEDIUM

HARP: Measuring Harm Amplification in Multi-Agent LLM Systems

Md Hafizur Rahman, Zafaryab Haider, Tanzim Mahfuz +1 more

Multi-agent LLM systems decompose workflows across agents, tools, shared context, memory, and decision gates. This modularity improves...

1 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

BAIT: Boundary-Guided Disclosure Escalation via Self-Conditioned Reasoning

Xuan Luo, Yue Wang, Geng Tu +2 more

In this work, we propose BAIT (Boundary-Aware Iterative Trap), a three-step jailbreak framework that approaches malicious goals through internal...

1 months ago cs.CR cs.CL PDF

Survey MEDIUM

On the Robustness of Machine Unlearning for Vision-Language Models

Yujie Lin, Kaidi Jia, Jiayao Ma +2 more

Vision-language models (VLMs) may memorize undesirable information from training data, motivating growing interest in machine unlearning. In this...

1 months ago cs.CV PDF

Benchmark MEDIUM

AlbanianLLMSafety: A Safety Evaluation Dataset for Large Language Models in Albanian

Wajdi Zaghouani, Kholoud K. Aldous, Isra Fejzullaj

Safety evaluation of Large Language Models (LLMs) has largely focused on high-resource languages, leaving low-resource languages critically...

1 months ago cs.CL PDF

Attack MEDIUM

EmoDistill: Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation

Yunbo Long, Haolang Zhao, Lukas Beckenbauer +2 more

Post-trained LLMs are often optimized to align responses with human preferences, making them safe, polite, and conversationally appropriate. In...

1 months ago cs.CL cs.AI PDF

Attack MEDIUM

Cordon-MAS: Defending RAG against Knowledge Poisoning via Information-Flow Control

Zhe Yu, Wenpeng Xing, Gaolei Li +4 more

Retrieval-augmented generation (RAG) increasingly underpins high-stakes applications, yet remains vulnerable to Confundo-style poisoning where...

1 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

GradSentry: Gradient Spectral Entropy for Backdoor Sample Filtering in Large Language Model Fine-Tuning

Haodong Zhao, Tianyi Xu, Tianhang Zhao +2 more

Fine-tuning Large Language Models with untrusted data exposes models to backdoor attacks, where poisoned samples cause targeted misbehavior. Existing...

1 months ago cs.CR PDF

Benchmark MEDIUM

SEC-bench Pro: Can Language Models Solve Long-Horizon Software Security Tasks?

Hwiwon Lee, Jiawei Liu, Dongjun Kim +3 more

Large language models (LLMs) now support automated software security tasks, including vulnerability discovery and proof-of-concept (PoC) generation....

1 months ago cs.CR cs.LG PDF

Defense MEDIUM

Aligning Provenance with Authorization: A Dual-Graph Defense for LLM Agents

Peiran Wang, Ying Li, Yuan Tian

LLM-based agents are increasingly deployed in high-stakes scenarios such as email management, financial transactions, and code execution, where they...

1 months ago cs.CR PDF

Benchmark MEDIUM

Constitutional Arms Races in the Public Goods Game: Co-Evolving LLM Constitutions Under Cooperation-Defection Pressure

Ujwal Kumar, Arth Singh, Hershraj Niranjani +5 more

Frontier LLM agents engage in blackmail, sabotage, and document leaks under goal conflicts in agentic settings, exposing limitations of alignment...

1 months ago cs.MA cs.GT cs.NE PDF

Defense MEDIUM

Curriculum Learning for Safety Alignment

Sandeep Kumar, Virginia Smith, Chhavi Yadav

Direct Preference Optimisation (DPO) is widely used for safety alignment in large language models. However, prior work shows it is brittle and...

1 months ago cs.LG cs.AI PDF

Benchmark MEDIUM

Building an Adversarial Malware Dataset by Family and Type: Generation, Evasion, and Poisoning Evaluation

David Košťál, Martin Jureček

We present a dataset of adversarial malware samples derived from the public RawMal-TF collection of real-world malware binaries. Using a suite of...

1 months ago cs.CR cs.LG PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial