AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 201–220 of 950 papers

Clear filters

Defense MEDIUM

GenAI-Driven Threat Detection with Microsoft Security Copilot

Scott Freitas, Amir Gharib

Defending against today's increasingly sophisticated cyberattacks requires security analysts to continuously translate evolving attacker tradecraft...

1 months ago cs.CR cs.AI cs.LG PDF

Benchmark MEDIUM

Hack-Verifiable Environments: Towards Evaluating Reward Hacking at Scale

Amit Roth, Ankur Samanta, Matan Halevy +2 more

Aligning autonomous agents with human intent remains a central challenge in modern AI. A key manifestation of this challenge is reward hacking,...

1 months ago cs.LG cs.AI PDF

Survey MEDIUM

Refusal Evaluation in Coding LLMs and Code Agents: A Systematic Review of Thirteen Malicious-Code Prompt Corpora (2023-2025)

Richard J. Young, Gregory D. Moody

The evaluation of large language model refusal on malicious-coding tasks now spans at least thirteen publicly released prompt corpora (AdvBench, the...

1 months ago cs.CR PDF

Defense MEDIUM

BiRD: A Bidirectional Ranking Defense Mechanism for Retrieval Augmented Generation

Chengcai Gao, Zhihong Sun, Xiaochuan Shi +2 more

The growing adoption of Retrieval-Augmented Generation (RAG) has led to a rise in adversarial attacks. Existing defenses, relying on semantic...

1 months ago cs.CR cs.IR PDF

Attack MEDIUM

Detecting Fluent Optimization-Based Adversarial Prompts via Sequential Entropy Changes

Mohammed Alshaalan, Miguel R. D. Rodrigues

Optimization-based adversarial suffixes can jailbreak aligned large language models (LLMs) while remaining fluent, weakening static and windowed...

1 months ago cs.LG cs.AI PDF

Attack MEDIUM

Auditing Privacy in Multi-Tenant RAG under Account Collusion

Florian A. D. Burnat, Brittany I. Davidson

Multi-tenant retrieval-augmented generation (RAG) services advertise per-account differential privacy as the operative leakage boundary: each...

1 months ago cs.CR cs.IR cs.LG PDF

Defense MEDIUM

Measuring Safety Alignment Effects in Autonomous Security Agents

Isaac David, Arthur Gervais

Do stock safety-aligned language models and their uncensored or abliterated derivatives behave differently when run as autonomous security agents?...

1 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

The Evaluation Game: Beyond Static LLM Benchmarking

Paul Wang, Jade Garcia-Bourrée, Anne-Marie Kermarrec +1 more

As jailbreaks, adversarially crafted inputs that bypass safety constraints, continue to be discovered in Large Language Models, practitioners...

1 months ago cs.LG cs.AI PDF

Attack MEDIUM

Exploring and Developing a Pre-Model Safeguard with Draft Models

Hongyu Cai, Arjun Arunasalam, Yiming Liang +2 more

Large Language Model (LLM) alignment remains vulnerable to jailbreak attacks that elicit unsafe responses, motivating pre-model and post-model...

1 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Backdooring Masked Diffusion Language Models

Daniel Yiming Cao, Chengzhong Wang, Sheng-Yen Chou +3 more

Masked diffusion language models (MDLMs) are emerging as a compelling new paradigm for text generation, but their training-time security remains...

1 months ago cs.LG cs.CR PDF

Attack MEDIUM

Token by Token, Compromised: Backdoor Vulnerabilities in Unified Autoregressive Models

Tobias Braun, Jonas Henry Grebe, Hossein Shakibania +2 more

Unified autoregressive models (UAMs) are transformer models that generate text as well as image tokens within a single autoregressive pass. Shared...

1 months ago cs.CR cs.AI PDF

Tool MEDIUM

Agent Security is a Systems Problem

Mihai Christodorescu, Earlence Fernandes, Ashish Hooda +11 more

We take the position that agent security must be approached as a systems problem: the AI model powering the agent must be treated as an untrusted...

1 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks

Yubin Qu, Ying Zhang, Yanjun Zhang +4 more

Coding agents now run autonomously with shell, file, and network privileges. When a user issues a benign request, the agent sometimes does more than...

1 months ago cs.SE cs.AI cs.CL PDF

Attack MEDIUM

OEP: Poisoning Self-Evolving LLM Agents via Locally Correct but Non-Transferable Experiences

Kaixiang Wang, Jiong Lou, Zhaojiacheng Zhou +1 more

Memory-augmented large language model (LLM) agents use iterative reflection and self-evolution to solve complex tasks, but these mechanisms introduce...

1 months ago cs.CR cs.AI cs.LG PDF

Benchmark MEDIUM

REBAR: Reference Ethical Benchmark for Autonomy Readiness

Jonathan Diller, David Barnes, Rebekah Bogdanoff +14 more

As autonomous systems grow more advanced, objective metrics to evaluate their ethical and legal compliance are critical for informing end users of...

1 months ago cs.RO cs.CY PDF

Tool MEDIUM

Prompts Don't Protect: Architectural Enforcement via MCP Proxy for LLM Tool Access Control

Rohith Uppala

Large language models increasingly operate as autonomous agents that select and invoke tools from large registries. We identify a critical gap: when...

1 months ago cs.CR cs.AI PDF

Defense MEDIUM

Safety Geometry Collapse in Multimodal LLMs and Adaptive Drift Correction

Jiahe Guo, Xiangran Guo, Jiaxuan Chen +6 more

Multimodal large language models (MLLMs) often fail to transfer safety capabilities learned in the text modality to semantically equivalent non-text...

1 months ago cs.AI cs.CR PDF

Benchmark MEDIUM

LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injectio

Lei Zhao, Abhay Bhaskar, Edgar Dobriban

AI agents such as OpenClaw are increasingly deployed in local workflows with access to external tools. This creates indirect prompt-injection (IPI)...

1 months ago cs.CR cs.AI PDF

Defense MEDIUM

From Detection to Response: A Deep Learning and Retrieval-Augmented Generation Framework for Network Intrusion Mitigation

Md Navid Bin Islam, Sajal Saha, Senior Member

Machine-learning-based Intrusion Detection Systems (IDS) have achieved impressive accuracy in classifying network attacks, yet they consistently fall...

1 months ago cs.CR PDF

Other MEDIUM

LLM-Based Static Verification of Code Against Natural-Language Requirements: An Industrial Experience Report

Zhi Quan Zhou, Dave Towey, Tsong Yueh Chen

Large language models (LLMs) are increasingly used to generate requirements specifications, design documents, code, and test cases. In contrast, much...

1 months ago cs.SE PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial