AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 21–40 of 91 papers

Clear filters

Attack HIGH

Attacking the Trusted Imagination: Oracle-Level Integrity Attacks on Imagine-then-Act World Models

Linghan Chen, Kaiyan Ji, Minyu Guo

Many recent vision-language-action (VLA) policies adopt an imagine-then-act design. A world-action model (WAM) first imagines a short future as a...

5 days ago cs.LG cs.AI cs.CR PDF

Benchmark HIGH

When AUC 0.998 Is Not Enough: A Candidate Evaluation Protocol for Hidden-State Probes of Indirect Prompt Injection in Multimodal Computer-Use Agents

Yanhang Li, Zhichao Fan, Zexin Zhuang

Hidden-state probing -- a linear classifier on a frozen vision-language model's internal activations -- has emerged as an attractive evaluation tool...

5 days ago cs.LG PDF

Defense HIGH

Towards Robust Personalized Federated Learning: Vulnerability Assessment and Defense Co-Design

Mingyuan Fan, Cen Chen

The proliferation of IoT devices has fueled distributed edge systems to collect vast amounts of sensitive data, creating fertile ground for on-device...

5 days ago cs.LG cs.CR PDF

Attack HIGH

Confidently Wrong: Severity-Aware Calibration of Prompt-Injection Detectors under Attack Shift

Md Anas Biswas

Prompt-injection detectors are deployed as guards: a model scores an input and a downstream system trusts or blocks it on that score. I study the...

6 days ago cs.CR cs.AI cs.LG PDF

Benchmark HIGH

RAVEN: Agentic RAG for Automated Vulnerability Repair

Varun Gadey, Zijie Liu, Alexandra Dmitrienko

Automated vulnerability repair has emerged as a promising direction to mitigate the growing number of software vulnerabilities. Recent advances in...

6 days ago cs.CR cs.LG cs.SE PDF

Attack HIGH

The Scissors Effect: When Resize-Based Input Diversity Helps or Hurts Transfer Attacks

Yuhang Jiang, Xiaojing Chen

Input Diversity (DI), which applies random resizing and padding at each attack iteration, is a near-default ingredient of transfer-based adversarial...

6 days ago cs.LG cs.CR cs.CV PDF

Tool HIGH

Analyzing Defensive Misdirection Against Model-Guided Automated Attacks on Agentic AI Systems

Reza Soosahabi, Vivek Namsani

Agentic AI systems increasingly rely on language-model components to interpret instructions, process external data, invoke tools, and coordinate with...

1 weeks ago cs.CR cs.AI PDF

Benchmark HIGH

LLM agent safety, multi-turn red-teaming, jailbreak benchmarks, adversarial robustness, safety-critical systems

Hanwool Lee, Dasol Choi, Bokyeong Kim +2 more

Large language model (LLM) agents are increasingly proposed as supervisory components for safety-critical systems, yet their robustness under...

1 weeks ago cs.CR cs.AI PDF

Benchmark HIGH

FFinRED: An Expert-Guided Benchmark Generation and Evaluation Framework for Financial LLM Red-Teaming

Chaeyun Kim, Daeyoung Park, Junghwan Kim +4 more

Existing safety benchmarks target general adversarial scenarios but miss finance-specific risks. Financial LLMs face regulatory compliance...

1 weeks ago cs.CR cs.AI PDF

Tool HIGH

A Layered Security Framework Against Prompt Injection in RAG-Based Chatbots

Gulshan Saleem, Nisar Ahmed, Muhammad Imran Zaman +1 more

Prompt injection is ranked as the most critical vulnerability in large language model (LLM) deployments by the OWASP Top 10 for LLM Applications, yet...

1 weeks ago cs.CR cs.CL PDF

Attack HIGH

CodeSentinel: A Three-Layer Defense Against Indirect Prompt Injection in Code Contexts

Po-Han Cheng, Chia-Mu Yu, Ying-Dar Lin +2 more

Code large language models increasingly retrieve external code context from repositories, documentation, issue threads, and coding-agent...

1 weeks ago cs.CR PDF

Attack HIGH

PhantomSkill: Malicious Code Injection in Agent Skill Ecosystems

Yu-Ting Lin, Chia-Mu Yu

Agent skills allow LLM-based coding agents to acquire domain-specific capabilities from third-party packages, but they also introduce a new...

1 weeks ago cs.CR PDF

Attack HIGH

OpenAnt: LLM-Powered Vulnerability Discovery Through Code Decomposition, Adversarial Verification, and Dynamic Testing

Nahum Korda, Gadi Evron

Automated vulnerability discovery in large codebases remains challenging: traditional static analysis produces high false-positive rates, while...

1 weeks ago cs.CR cs.LG PDF

Attack HIGH

OpenAnt: LLM-Powered Vulnerability Discovery Through Code Decomposition, Adversarial Verification, and Dynamic Testing

Nahum Korda, Gadi Evron

Automated vulnerability discovery in large codebases remains challenging: traditional static analysis produces high false-positive rates, while...

1 weeks ago cs.CR cs.LG PDF

Tool HIGH

Image Prompt Reconstruction Attacks on Distributed MLLM Inference Frameworks

Xinjian Luo, Hongyan Chang, Jianxin Wei +5 more

Distributed large language model (LLM) inference frameworks connect isolated consumer-grade devices for large-scale model inference, substantially...

1 weeks ago cs.CR PDF

Attack HIGH

Understanding and Mitigating Prompt Leaking Attacks in Real-World LLM-Based Applications

Yong Yang, Chong Fu, Tong Zhang +6 more

Large language model (LLM)-based applications rely on system prompts to encode core logic and developer-defined constraints, making these prompts...

1 weeks ago cs.CR PDF

Attack HIGH

Seeing Is Not Screening: Multimodal Hidden Instruction Attacks on Agent Skill Scanners

Xiaojun Jia, Jie Liao, Simeng Qin +5 more

Agent skills are emerging as an important attack surface in LLM-based systems. Through an empirical study of existing skill scanners, we find that...

1 weeks ago cs.CR cs.CV PDF

Survey HIGH

A Red-Team Study of Anthropic Fable 5 & Opus 4.8 Models

Nicola Franco

We evaluate the adversarial robustness of two frontier large language models (LLMs) developed by Anthropic, Fable 5 and Opus 4.8, against four...

1 weeks ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Structural Role Injection in Handlebars-Templated LLM Prompts: Triple-Brace Interpolation, Delimiter Family, and the Limits of HTML Auto-Escaping

Mohammadreza Rashidi

Large language model applications build prompts from templates, and Handlebars is a widely used templating engine and the default prompt-template...

1 weeks ago cs.CR cs.AI cs.CL PDF

Tool HIGH

Conflict-Aware Retriever Editing for Knowledge Injection Attacks on LLM-Based RAG Systems

Xinru Liu, Xianglong Zhang, Di Cai +3 more

Injecting malicious knowledge into retrieval-augmented generation (RAG) systems can manipulate retrieved evidence and mislead downstream generation,...

1 weeks ago cs.CR cs.AI PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial