AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 201–220 of 3,023 papers

Survey LOW

The Chronicles of Radio Frequency Fingerprinting

Abdul Aziz, Ingrid Huso, Savio Sciancalepore +1 more

Radio Frequency Fingerprinting (RFF) has evolved from an early idea for radar emitter identification into a broad research field for wireless device...

2 weeks ago cs.CR PDF

Attack HIGH

Learning to Attack and Defend: Adaptive Red Teaming of Language Models via GRPO

Blake Bullwinkel, Eugenia Kim, Amanda Minnich +1 more

AI red teaming must continually adapt to evolving attackers and defenders. Reinforcement learning offers a promising approach to discovering novel...

2 weeks ago cs.CL cs.AI cs.LG PDF

Attack HIGH

What the Eyes See, the LLMs Miss: Exploiting Human Perception for Adversarial Text Attacks

Qin Yang, Lu Malloy, Joshua Lee +4 more

Large language model (LLM)-powered content moderation systems have become a critical defense against harmful online content. However, these systems...

2 weeks ago cs.CR cs.HC cs.LG PDF

Defense LOW

Gradient-Guided Reward Optimization for Inference-time Alignment

Hankun Lin, Ruqi Zhang

Ensuring the reliability of Large Language Models (LLMs) under distribution drift requires inference-time adaptation. While inference-time alignment...

2 weeks ago cs.CL cs.LG PDF

Attack MEDIUM

PRISM: Recovering Instruction Sets from Language Model Activations

Gilad Gressel, Rahul Pankajakshan, Julia Diament +3 more

As LLMs are deployed as agents, reliable monitoring requires knowing not only what they output, but which instructions are steering their behavior....

2 weeks ago cs.AI cs.LG PDF

Benchmark MEDIUM

Safe-RULE: Safe Reinforcement UnLEarning

Shixiong Jiang, Taozheng Zhu, Fanxin Kong

Offline safe reinforcement learning (Safe RL) enables policy learning without online interactions, making it suitable for safety-critical systems...

2 weeks ago cs.LG cs.AI cs.CR PDF

Benchmark MEDIUM

AI Scientists Are Only as Good as Their Evidence: A Stratified Ablation of Proprietary Data and Reasoning Skills in Drug-Asset Valuation

Yinan Wang

AI Scientist agents are often evaluated as if capability were mainly a function of model quality, prompting, or reasoning scaffolds. We test a...

2 weeks ago cs.AI PDF

Survey MEDIUM

SecureClaw: Clawing Back Control of LLM Agents

Yuhan Ma, Stefan Schmid

Tool-using large language model (LLM) agents face two distinct security failures: unauthorized external actions and exposure of sensitive plaintext...

2 weeks ago cs.CR cs.AI PDF

Attack MEDIUM

Model Poisoning Against Federated Model Adaptation with Chain of Bit-Flips

Bastien Vuillod, Kevin Hector, Pierre-Alain Moellic +2 more

Federated Learning (FL) allows a set of clients to collectively train a global model without sharing local training data. Giving the responsibility...

2 weeks ago cs.CR cs.AI PDF

Benchmark MEDIUM

Targeting World Models to Compromise Robot Learning Pipelines

Ethan Rathbun, Ahmed Agha, Saaduddin Mahmud +3 more

World models have recently seen a rapid growth in both their popularity and capability as more data efficient tools for generating robot training...

2 weeks ago cs.RO cs.AI cs.CR PDF

Benchmark LOW

Now You (Still) See Me: Detecting Evasive Steganographic Payloads in LLMs

Charles Westphal, Timothy Douglas, Keivan Navaie +2 more

Large language models can be fine-tuned to encode prompt-borne secrets into fluent, seemingly benign outputs. This creates a steganographic...

2 weeks ago cs.CR cs.IT cs.LG PDF

Benchmark MEDIUM

Can Data Work be Reparative?

Srravya Chandhiramowuli, Ding Wang, Alex Taylor

We present an ethnographic study of an alternative approach to data work, developed by a civic-tech initiative that builds datasets for training and...

2 weeks ago cs.CY cs.AI cs.HC PDF

Benchmark MEDIUM

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Bartłomiej Marek, Lorenzo Rossi, Vincent Hanke +4 more

Recent work has applied differential privacy (DP) to adapt large language models (LLMs) for sensitive applications, offering theoretical guarantees....

2 weeks ago cs.LG cs.CR PDF

Attack HIGH

Brain-Prompt Injection: A Route-Safety Audit for BCI-LLM Agents

Jianwei Tai

BCI-to-agent pipelines turn decoded neural activity into an authorization channel for tool-use agents, exposing a new attack surface we call...

2 weeks ago cs.CR cs.AI PDF

Attack HIGH

The Injection Paradox: Brand-Level Suppression in Safety-Trained LLM Recommendations via RAG Context Injection

Hyunseok Paeng

We present a reproducible failure mode of safety training in RAG-based LLM recommendation -- the Injection Paradox -- in which prompt injections...

2 weeks ago cs.LG cs.CL cs.CR PDF

Benchmark HIGH

Culturally-Adapted Red-Teaming Across East and Southeast Asian Contexts: A Methodological and Comparative Analysis

Hyeji Choi, Yongtaek Lim, Minwoo Kim

Multilingual safety evaluation of large language models (LLMs) has predominantly relied on direct translation (DT) of English benchmarks into target...

2 weeks ago cs.CL cs.AI PDF

Attack MEDIUM

Customization under Fire: Plugin Poisoning in Text-to-Image Ecosystem

Jiahao Chen, Xing He, Yong Yang +6 more

The prosperity of text-to-image (T2I) models has fostered a vibrant share-and-play ecosystem centered on Low-Rank Adaptation (LoRA) plugins, which...

2 weeks ago cs.CR PDF

Defense LOW

Unveiling Privacy Risks in Multi-modal Large Language Models: Task-specific Vulnerabilities and Mitigation Challenges

Tiejin Chen, Pingzhi Li, Kaixiong Zhou +2 more

Privacy risks in text-only Large Language Models (LLMs) are well studied, particularly their tendency to memorize and leak sensitive information....

2 weeks ago cs.CR cs.AI PDF

Defense LOW

Autonomous Incident Resolution at Hyperscale: An Agentic AI Architecture for Network Operations

Arun Malik

Cloud network infrastructure at hyperscale presents unique operational challenges where traditional human-driven incident response cannot keep pace...

2 weeks ago cs.SE cs.AI cs.ET PDF

Tool HIGH

Context-Fractured Decomposition Attacks on Tool-Using LLM Agents: Exploiting Artifact Provenance Gaps

Xiaofeng Lin, Yukai Yang, Daniel Guo +3 more

Tool-using LLM agents interact with the world through actions that persist state in artifacts (e.g., workspace files or logs). Consequently,...

2 weeks ago cs.CR cs.AI PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial