AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 101–120 of 146 papers

Clear filters

Attack MEDIUM

Runtime Skill Audit: Targeted Runtime Probing for Agent Skill Security

Tu Lan, Chaowei Xiao

Agent skills let LLM agents reuse instructions, resources, tools, and workflows, but they also create a new place for malicious behavior to hide. A...

2 weeks ago cs.CR cs.AI PDF

Defense MEDIUM

Dummy Backdoor as a Defense: Removing Unknown Backdoors via Shared Internal Mechanisms for Generative LLMs

Kazuki Iwahana, Masaru Matsubayashi, Takuma Koyama +3 more

Backdoor attacks pose a serious threat to the safety and reliability of Large Language Models (LLMs), as they cause models to behave normally on...

2 weeks ago cs.CR cs.CL PDF

Benchmark MEDIUM

Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models

Malikeh Ehghaghi, Boglárka Ecsedi, Marsha Chechik +1 more

Adversarial robustness evaluations of large language models (LLMs) typically report attack success rate (ASR) under fixed query budgets, implicitly...

2 weeks ago cs.LG cs.AI cs.CR PDF

Benchmark MEDIUM

It Takes One to Bias Them All: Breaking Bad with One-Shot GRPO

Naihao Deng, Yilun Zhu, Naichen Shi +2 more

Warning: This paper contains several toxic and offensive statements. Modern large language models (LLMs) are typically aligned through large-scale...

2 weeks ago cs.CL PDF

Defense MEDIUM

Comparative Analysis of Inference-Time Defense Methods for Multimodal Large Language Models

Bulat Nutfullin, Vladimir Evgrafov, Dmitry Namiot

Multimodal large language models (MLLMs) now appear in safety-critical applications, but the visual channel leaves them open to adversarial attacks...

2 weeks ago cs.CR PDF

Attack MEDIUM

Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimization

Lena S. Bolliger, Lena A. Jäger

Production LLMs receive instructions from sources with very different levels of trust, yet attend to every token with uniform architectural...

2 weeks ago cs.CR cs.CL PDF

Attack MEDIUM

MemVenom: Triggered Poisoning of Multimodal Memories in Web Agents

Yv Zhang, Hao Sun, Hao Fang +5 more

External memory has become a core component of modern web agents, enabling long-horizon reasoning through the retrieval of past experiences. However,...

2 weeks ago cs.CR cs.LG PDF

Attack MEDIUM

Improving Adversarial Transferability on Vision-Language Pre-training Models via Surrogate-Specific Bias Correction

Lijia Yu, Jiuxin Cao, Yuchen Qiang +3 more

Adversarial examples reveal vulnerabilities in Vision-Language Pre-training (VLP) models and provide insights for improving robustness. A key...

2 weeks ago cs.CV cs.AI cs.CR PDF

Benchmark MEDIUM

AgentCanary: A Security Evaluation Framework for Autonomous AI Agents in Real Executable Environments

Peiyang Li, Songping Wang, Yi Huang +9 more

Autonomous AI agents have driven the transition from conversation to task execution, shifting security failures from textual deception to system...

2 weeks ago cs.CR PDF

Attack MEDIUM

Advancing the State-of-the-Art in Empirical Privacy Auditing

Nicole Mitchell, Galen Andrew, Arun Ganesh +2 more

Parameter-efficient fine-tuning of large language models (LLMs) can exhibit problematic memorization of individual training examples. Empirical...

2 weeks ago cs.LG cs.AI cs.CL PDF

Attack MEDIUM

Semantic Multi-Agent Intrusion Detection for IoT:Zero-Day and Adversarial Threats with Risk-Aware Reasoning

Saeid Jamshidi

The rapid proliferation of Internet of Things (IoT) devices has enabled unprecedented automation and connectivity, but it has also substantially...

2 weeks ago cs.CR PDF

Benchmark MEDIUM

Game-Theoretic Multi-Agent Control for Robust Contextual Reasoning in LLMs

Saeid Jamshidi, Amin Nikanjam, Arghavan Moradi Dakhel +2 more

Large Language Models (LLMs) in multi-turn interactions maintain evolving context rather than generating isolated responses, making them vulnerable...

2 weeks ago cs.CR cs.MA PDF

Benchmark MEDIUM

Baseline-Free Policy Optimization for Neural Combinatorial Optimization

Carlos S. Sepúlveda, Gonzalo A. Ruz

Neural combinatorial optimization (NCO) trains autoregressive policies to solve routing problems. The standard training algorithm, REINFORCE with a...

2 weeks ago cs.LG cs.AI cs.RO PDF

Tool MEDIUM

RadKey: An LLM-Guided RF Backscatter System for Through-Wall Keystroke Inference

Qijun Wang, Chunqi Qian, Huacheng Zeng

In today's digitally connected world, keyboards remain the primary interface for inputting sensitive information, making them a persistent target for...

2 weeks ago cs.CR PDF

Survey MEDIUM

SoK: Colluding Adversaries in Machine Learning Pipelines

Vasisht Duddu, Lipeng He, Asim Waheed +1 more

Machine learning (ML) models are susceptible to various security, privacy, and fairness risks. Adversaries with different characteristics (i.e.,...

2 weeks ago cs.CR cs.LG PDF

Attack MEDIUM

PRISM: Recovering Instruction Sets from Language Model Activations

Gilad Gressel, Rahul Pankajakshan, Julia Diament +3 more

As LLMs are deployed as agents, reliable monitoring requires knowing not only what they output, but which instructions are steering their behavior....

2 weeks ago cs.AI cs.LG PDF

Benchmark MEDIUM

Safe-RULE: Safe Reinforcement UnLEarning

Shixiong Jiang, Taozheng Zhu, Fanxin Kong

Offline safe reinforcement learning (Safe RL) enables policy learning without online interactions, making it suitable for safety-critical systems...

2 weeks ago cs.LG cs.AI cs.CR PDF

Benchmark MEDIUM

AI Scientists Are Only as Good as Their Evidence: A Stratified Ablation of Proprietary Data and Reasoning Skills in Drug-Asset Valuation

Yinan Wang

AI Scientist agents are often evaluated as if capability were mainly a function of model quality, prompting, or reasoning scaffolds. We test a...

2 weeks ago cs.AI PDF

Survey MEDIUM

SecureClaw: Clawing Back Control of LLM Agents

Yuhan Ma, Stefan Schmid

Tool-using large language model (LLM) agents face two distinct security failures: unauthorized external actions and exposure of sensitive plaintext...

2 weeks ago cs.CR cs.AI PDF

Attack MEDIUM

Model Poisoning Against Federated Model Adaptation with Chain of Bit-Flips

Bastien Vuillod, Kevin Hector, Pierre-Alain Moellic +2 more

Federated Learning (FL) allows a set of clients to collectively train a global model without sharing local training data. Giving the responsibility...

2 weeks ago cs.CR cs.AI PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial