AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 101–120 of 1,175 papers

Clear filters

Attack HIGH

Hijacking Agent Memory: Stealthy Trojan Attacks Through Conversational Interaction

Hongtao Wang, Se Yang, Yu Chen +1 more

Large language model (LLM) agents increasingly leverage long term memory to support persistent and autonomous task execution. However, this...

1 months ago cs.CR cs.AI PDF

Attack MEDIUM

Cert-LAS: Toward Certified Model Ownership Verification for Text-to-Image Diffusion Models via Layer-Adaptive Smoothing

Leyi Qi, Yiming Li, Siyuan Liang +2 more

Large-scale text-to-image (T2I) diffusion models have enabled unprecedented creative applications, but their unauthorized use has raised serious...

1 months ago cs.CR cs.CV cs.GR PDF

Attack MEDIUM

Minimal Prompt Perturbations Lead to Code Vulnerabilities: Prompt Fragility and Hidden-State Signals in Coding LLMs

Alexander Sternfeld, Andrei Kucharavy, Ljiljana Dolamic

LLM-based coding assistants are seeing rapid adoption, offering substantial gains in developer productivity. As organizations increasingly ship code...

1 months ago cs.CR cs.CL cs.SE PDF

Attack HIGH

Opir: Efficient Multi-Task Safety Classification for Toxicity, Jailbreaks, Hate Speech, and Harmful Content

Ihor Stepanov, Aleksandr Smechov

Real-time safety filtering for large language model (LLM) applications requires classifiers that can detect unsafe prompts, toxic language, jailbreak...

1 months ago cs.LG cs.AI cs.CL PDF

Attack HIGH

Beyond Attack Success Rate: Temporal Logit Observability for LLM Safety Failures

Junyoung Park, Sunghwan Park, Seongyong Ju +1 more

Attack Success Rate (ASR) evaluates each jailbreak with a single yes/no label at the end of generation, telling us whether a failure happened but not...

1 months ago cs.AI PDF

Attack HIGH

Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening

Mohan Zhang, Yuqi Jia, Zhen Tan +4 more

LLMs are vulnerable to prompt injection attacks. However, this vulnerability has been primarily demonstrated conceptually in academic studies or...

1 months ago cs.CR cs.AI cs.CL PDF

Attack MEDIUM

GraphSteal: Structural Knowledge Stealing from Graph RAG via Traversal Reconstruction

Jinze Gu, Qinghua Mao, Xi Lin +1 more

Retrieval-Augmented Generation (RAG) enhances LLMs by grounding generation in query-relevant external evidence. Beyond unstructured text corpora,...

1 months ago cs.CR cs.CL PDF

Attack HIGH

Blind PRNG Hijacking: An Undetectable Integrity-Preserving Attack Against LLM Watermarking

Ziyang You, Huilong He, Xiaoke Yang +1 more

Cryptographic watermarking is a leading defense for attributing text generated by large language models (LLMs). Existing schemes, including KGW,...

1 months ago cs.CR cs.AI PDF

Attack MEDIUM

LACUNA: Safe Agents as Recursive Program Holes

Yaoyu Zhao, Yichen Xu, Oliver Bračevac +3 more

LLM agents increasingly act by writing code, yet a split persists between the runtime that drives the agent and the code the model writes. The...

1 months ago cs.AI cs.PL PDF

Attack MEDIUM

Quantum-Enhanced Adversarial Robustness in Artificial Intelligence

Jaydip Sen

Artificial Intelligence has achieved remarkable success across diverse application domains. However, its vulnerability to adversarial attacks poses...

1 months ago cs.CR cs.AI PDF

Attack HIGH

Refusal Before Decoding: Detecting and Exploiting Refusal Signals in Intermediate LLM Activations

Matteo Gioele Collu, Riccardo Conte, Alberto Giaretta +4 more

In this paper, we investigate whether refusal behavior can be predicted from LLM intermediate activations before decoding using linear probes trained...

1 months ago cs.AI cs.CR PDF

Attack HIGH

Mitigating Adaptive Attacks against Reasoning Models with Activation Consistency Training

Avidan Shah, Jannik Brinkmann, Rico Angell

As LLMs gain stronger reasoning capabilities, their extended chain-of-thought introduces new degrees of complexity for defending against adversarial...

1 months ago cs.LG PDF

Attack HIGH

Out of Sight, Not Out of Mind: Unveiling Latent Attack in Latent-based Multi-Agent Systems

Chenxi Wang, Ruiyang Huang, Jiayan Sun +2 more

Latent-based multi-agent systems replace parts of explicit inter-agent communication with hidden representations, offering a new direction for...

1 months ago cs.CR cs.LG cs.MA PDF

Attack HIGH

Plant, Persist, Trigger: Sleeper Attack on Large Language Model Agents

Yongxiang Li, Moxin Li, Zhixin Ma +4 more

Large Language Model (LLM) agents remain vulnerable to safety threats from the external environment, where attackers inject adversarial content into...

1 months ago cs.AI PDF

Attack HIGH

MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents via User-Generated Content

Ruoqi Guo, Yi Liu, Gelei Deng +7 more

Mobile graphical user interface (GUI) agents driven by vision-language models (VLMs) perceive the screen as rendered pixels and choose actions from...

1 months ago cs.CR cs.AI cs.CL PDF

Attack MEDIUM

A Wolf in Sheep's Clothing: Targeted Routing Hijacking in Federated RAG

Junjie Mu, Qiongxiu Li

Federated Retrieval-Augmented Generation (FedRAG) is attractive for privacy-sensitive applications because raw data remain local. As a result,...

1 months ago cs.CR cs.CL cs.IR PDF

Attack MEDIUM

SilentRetrieval: Hijacking Retrieval-Augmented Generation via Semantically-Preserving Adversarial Data Poisoning

Jiachen Qian

Retrieval-Augmented Generation (RAG) mitigates LLM hallucinations but introduces a critical vulnerability: corpus integrity. We present...

1 months ago cs.CR cs.CL cs.IR PDF

Attack HIGH

Can It Reach the Generator? Investigating the Survival of Prompt-Injection Attacks in Realistic RAG Settings

Yu Yin, Shuai Wang, Bevan Koopman +1 more

Recent generative engine optimisation (GEO) research has shown that prompt-injection attacks can push a target product to the top of an LLM's...

1 months ago cs.CR cs.IR PDF

Attack HIGH

When Think-with-Image Meets Safety: What Determines Multimodal Jailbreak Robustness?

Yuan Tian, Bing Hu, Fang Wu +3 more

Think-with-image reasoning is emerging as a new inference paradigm for large vision-language models, but its safety implications remain poorly...

1 months ago cs.CV cs.AI cs.CL PDF

Attack MEDIUM

Disentangling Adversarial Prompts: A Semantic-Graph Defense for Robust LLM Security

Xiang Fang, Wanlong Fang

Large Language Models (LLMs) are increasingly vulnerable to adversarial prompts that exploit semantic ambiguities to bypass safety mechanisms,...

1 months ago cs.CR cs.AI cs.CV PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial