AI Security Research

AI Threat Alert indexes 3,082+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,082
Attack

1,196
Benchmark

883
Defense

421
Tool

321
Survey

181

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1561–1580 of 3,082 papers

Benchmark MEDIUM

Steering Safely or Off a Cliff? Rethinking Specificity and Robustness in Inference-Time Interventions

Navita Goyal, Hal Daumé

Model steering, which involves intervening on hidden representations at inference time, has emerged as a lightweight alternative to finetuning for...

4 months ago cs.LG cs.AI cs.CL PDF

Benchmark MEDIUM

Private and interpretable clinical prediction with quantum-inspired tensor train models

José Ramón Pareja Monturiol, Juliette Sinnott, Roger G. Melko +1 more

Machine learning in clinical settings must balance predictive accuracy, interpretability, and privacy. Models such as logistic regression (LR) offer...

4 months ago cs.LG cs.CR quant-ph PDF

Attack HIGH

Learning to Inject: Automated Prompt Injection via Reinforcement Learning

Xin Chen, Jie Zhang, Florian Tramèr

Prompt injection is one of the most critical vulnerabilities in LLM agents; yet, effective automated attacks remain largely unexplored from an...

4 months ago cs.LG cs.AI PDF

Benchmark LOW

CASTLE: A Comprehensive Benchmark for Evaluating Student-Tailored Personalized Safety in Large Language Models

Rui Jia, Ruiyi Lan, Fengrui Liu +7 more

Large language models (LLMs) have advanced the development of personalized learning in education. However, their inherent generation mechanisms often...

4 months ago cs.CL PDF

Attack MEDIUM

Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification

Tao Huang, Rui Wang, Xiaofei Liu +3 more

%Large vision-language models (LVLMs) have shown substantial advances in multimodal understanding and generation. However, when presented with...

4 months ago cs.LG PDF

Defense MEDIUM

ProMoral-Bench: Evaluating Prompting Strategies for Moral Reasoning and Safety in LLMs

Rohan Subramanian Thomas, Shikhar Shiromani, Abdullah Chaudhry +4 more

Prompt design significantly impacts the moral competence and safety alignment of large language models (LLMs), yet empirical comparisons remain...

4 months ago cs.AI cs.CL PDF

Attack HIGH

Clouding the Mirror: Stealthy Prompt Injection Attacks Targeting LLM-based Phishing Detection

Takashi Koide, Hiroki Nakano, Daiki Chiba

Phishing sites continue to grow in volume and sophistication. Recent work leverages large language models (LLMs) to analyze URLs, HTML, and rendered...

4 months ago cs.CR PDF

Attack HIGH

Causal Front-Door Adjustment for Robust Jailbreak Attacks on LLMs

Yao Zhou, Zeen Song, Wenwen Qiang +4 more

Safety alignment mechanisms in Large Language Models (LLMs) often operate as latent internal states, obscuring the model's inherent capabilities....

4 months ago cs.CL PDF

Benchmark LOW

Beyond single-channel agentic benchmarking

Nelu D. Radpour

Contemporary benchmarks for agentic artificial intelligence (AI) frequently evaluate safety through isolated task-level accuracy thresholds,...

4 months ago cs.CY cs.AI cs.HC PDF

Attack HIGH

BadTemplate: A Training-Free Backdoor Attack via Chat Template Against Large Language Models

Zihan Wang, Hongwei Li, Rui Zhang +2 more

Chat template is a common technique used in the training and inference stages of Large Language Models (LLMs). It can transform input and output data...

4 months ago cs.CR PDF

Defense MEDIUM

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

Zhenxiong Yu, Zhi Yang, Zhiheng Jin +19 more

As large language models (LLMs) evolve into autonomous agents, their real-world applicability has expanded significantly, accompanied by new security...

4 months ago cs.CR cs.AI PDF

Attack HIGH

SynAT: Enhancing Security Knowledge Bases via Automatic Synthesizing Attack Tree from Crowd Discussions

Ziyou Jiang, Lin Shi, Guowei Yang +3 more

Cyber attacks have become a serious threat to the security of software systems. Many organizations have built their security knowledge bases to...

4 months ago cs.CR PDF

Tool MEDIUM

Copyright Detective: A Forensic System to Evidence LLMs Flickering Copyright Leakage Risks

Guangwei Zhang, Jianing Zhu, Cheng Qian +12 more

We present Copyright Detective, the first interactive forensic system for detecting, analyzing, and visualizing potential copyright risks in LLM...

5 months ago cs.CL PDF

Attack HIGH

Visual Exclusivity Attacks: Automatic Multimodal Red Teaming via Agentic Planning

Yunbei Zhang, Yingqiang Ge, Weijie Xu +3 more

Current multimodal red teaming treats images as wrappers for malicious payloads via typography or adversarial noise. These attacks are structurally...

5 months ago cs.CR cs.CV cs.LG PDF

Attack HIGH

Beware Untrusted Simulators -- Reward-Free Backdoor Attacks in Reinforcement Learning

Ethan Rathbun, Wo Wei Lin, Alina Oprea +1 more

Simulated environments are a key piece in the success of Reinforcement Learning (RL), allowing practitioners and researchers to train decision making...

5 months ago cs.CR cs.LG cs.RO PDF

Attack HIGH

Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks

Jafar Isbarov, Murat Kantarcioglu

As AI agents automate critical workloads, they remain vulnerable to indirect prompt injection (IPI) attacks. Current defenses rely on monitoring...

5 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Do Vision-Language Models Respect Contextual Integrity in Location Disclosure?

Ruixin Yang, Ethan Mendes, Arthur Wang +4 more

Vision-language models (VLMs) have demonstrated strong performance in image geolocation, a capability further sharpened by frontier multimodal large...

5 months ago cs.CR cs.AI PDF

Attack MEDIUM

Comparative Insights on Adversarial Machine Learning from Industry and Academia: A User-Study Approach

Vishruti Kakkad, Paul Chung, Hanan Hibshi +1 more

An exponential growth of Machine Learning and its Generative AI applications brings with it significant security challenges, often referred to as...

5 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Alignment Drift in Multimodal LLMs: A Two-Phase, Longitudinal Evaluation of Harm Across Eight Model Releases

Casey Ford, Madison Van Doren, Emily Dix

Multimodal large language models (MLLMs) are increasingly deployed in real-world systems, yet their safety under adversarial prompting remains...

5 months ago cs.CL cs.AI cs.HC PDF

Benchmark LOW

From Data to Behavior: Predicting Unintended Model Behaviors Before Training

Mengru Wang, Zhenqian Xu, Junfeng Fang +4 more

Large Language Models (LLMs) can acquire unintended biases from seemingly benign training data even without explicit cues or malicious content....

5 months ago cs.LG cs.AI cs.CL PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,082+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial