AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 881–900 of 1,175 papers

Clear filters

Attack HIGH

Steering in the Shadows: Causal Amplification for Activation Space Attacks in Large Language Models

Zhiyuan Xu, Stanislav Abaimov, Joseph Gardiner +1 more

Modern large language models (LLMs) are typically secured by auditing data, prompts, and refusal policies, while treating the forward pass as an...

7 months ago cs.CR PDF

Attack MEDIUM

MURMUR: Using cross-user chatter to break collaborative language agents in groups

Atharv Singh Patlan, Peiyao Sheng, S. Ashwin Hebbar +2 more

Language agents are rapidly expanding from single-user assistants to multi-user collaborators in shared workspaces and groups. However, today's...

7 months ago cs.CR cs.AI cs.CL PDF

Attack MEDIUM

Evaluating Adversarial Vulnerabilities in Modern Large Language Models

Tom Perel

The recent boom and rapid integration of Large Language Models (LLMs) into a wide range of applications warrants a deeper understanding of their...

7 months ago cs.CR cs.AI PDF

Attack HIGH

"To Survive, I Must Defect": Jailbreaking LLMs via the Game-Theory Scenarios

Zhen Sun, Zongmin Zhang, Deqi Liang +8 more

As LLMs become more common, non-expert users can pose risks, prompting extensive research into jailbreak attacks. However, most existing black-box...

7 months ago cs.CR cs.AI PDF

Attack MEDIUM

PSM: Prompt Sensitivity Minimization via LLM-Guided Black-Box Optimization

Huseein Jawad, Nicolas Brunel

System prompts are critical for guiding the behavior of Large Language Models (LLMs), yet they often contain proprietary logic or sensitive...

7 months ago cs.CR cs.CL PDF

Attack HIGH

Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models

Yijun Yang, Lichao Wang, Jianping Zhang +3 more

The growing misuse of Vision-Language Models (VLMs) has led providers to deploy multiple safeguards, including alignment tuning, system prompts, and...

7 months ago cs.CR PDF

Attack HIGH

AutoBackdoor: Automating Backdoor Attacks via LLM Agents

Yige Li, Zhe Li, Wei Zhao +4 more

Backdoor attacks pose a serious threat to the secure deployment of large language models (LLMs), enabling adversaries to implant hidden behaviors...

7 months ago cs.CR cs.AI PDF

Attack HIGH

What Your Features Reveal: Data-Efficient Black-Box Feature Inversion Attack for Split DNNs

Zhihan Ren, Lijun He, Jiaxi Liang +3 more

Split DNNs enable edge devices by offloading intensive computation to a cloud server, but this paradigm exposes privacy vulnerabilities, as the...

7 months ago cs.CV PDF

Attack HIGH

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

Piercosma Bisconti, Matteo Prandi, Federico Pierucci +7 more

We present evidence that adversarial poetry functions as a universal single-turn jailbreak technique for Large Language Models (LLMs). Across 25...

7 months ago cs.CL cs.AI PDF

Attack HIGH

Securing AI Agents Against Prompt Injection Attacks

Badrinath Ramakrishnan, Akshaya Balaji

Retrieval-augmented generation (RAG) systems have become widely used for enhancing large language model capabilities, but they introduce significant...

7 months ago cs.CR cs.AI PDF

Attack HIGH

Unified Defense for Large Language Models against Jailbreak and Fine-Tuning Attacks in Education

Xin Yi, Yue Li, Dongsheng Shi +3 more

Large Language Models (LLMs) are increasingly integrated into educational applications. However, they remain vulnerable to jailbreak and fine-tuning...

7 months ago cs.CL PDF

Attack HIGH

Sigil: Server-Enforced Watermarking in U-Shaped Split Federated Learning via Gradient Injection

Zhengchunmin Dai, Jiaxiong Tang, Peng Sun +2 more

In decentralized machine learning paradigms such as Split Federated Learning (SFL) and its variant U-shaped SFL, the server's capabilities are...

7 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

SteganoBackdoor: Stealthy and Data-Efficient Backdoor Attacks on Language Models

Eric Xue, Ruiyi Zhang, Pengtao Xie

Modern language models remain vulnerable to backdoor attacks via poisoned data, where training inputs containing a trigger are paired with a target...

7 months ago cs.CR cs.CL cs.LG PDF

Attack HIGH

Beyond Fixed and Dynamic Prompts: Embedded Jailbreak Templates for Advancing LLM Security

Hajun Kim, Hyunsik Na, Daeseon Choi

As the use of large language models (LLMs) continues to expand, ensuring their safety and robustness has become a critical challenge. In particular,...

7 months ago cs.CR PDF

Attack HIGH

Dynamic Black-box Backdoor Attacks on IoT Sensory Data

Ajesh Koyatan Chathoth, Stephen Lee

Sensor data-based recognition systems are widely used in various applications, such as gait-based authentication and human activity recognition...

7 months ago cs.CR cs.LG PDF

Attack HIGH

GRPO Privacy Is at Risk: A Membership Inference Attack Against Reinforcement Learning With Verifiable Rewards

Yule Liu, Heyi Zhang, Jinyi Zheng +6 more

Membership inference attacks (MIAs) on large language models (LLMs) pose significant privacy risks across various stages of model training. Recent...

7 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Tuning for Two Adversaries: Enhancing the Robustness Against Transfer and Query-Based Attacks using Hyperparameter Tuning

Pascal Zimmer, Ghassan Karame

In this paper, we present the first detailed analysis of how training hyperparameters -- such as learning rate, weight decay, momentum, and batch...

7 months ago cs.LG cs.CR cs.CV PDF

Attack MEDIUM

DualTAP: A Dual-Task Adversarial Protector for Mobile MLLM Agents

Fuyao Zhang, Jiaming Zhang, Che Wang +6 more

The reliance of mobile GUI agents on Multimodal Large Language Models (MLLMs) introduces a severe privacy vulnerability: screenshots containing...

7 months ago cs.CR PDF

Attack MEDIUM

Efficient Adversarial Malware Defense via Trust-Based Raw Override and Confidence-Adaptive Bit-Depth Reduction

Ayush Chaudhary, Sisir Doppalpudi

The deployment of robust malware detection systems in big data environments requires careful consideration of both security effectiveness and...

7 months ago cs.CR cs.LG PDF

Attack MEDIUM

LLM Reinforcement in Context

Thomas Rivasseau

Current Large Language Model alignment research mostly focuses on improving model robustness against adversarial attacks and misbehavior by training...

7 months ago cs.CL cs.CR PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial