AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 981–1000 of 1,050 papers

Clear filters

Attack HIGH

Proactive defense against LLM Jailbreak

Weiliang Zhao, Jinjun Peng, Daniel Ben-Levi +2 more

The proliferation of powerful large language models (LLMs) has necessitated robust safety alignment, yet these models remain vulnerable to evolving...

8 months ago cs.CR cs.CL PDF

Attack HIGH

Imperceptible Jailbreaking against Large Language Models

Kuofeng Gao, Yiming Li, Chao Du +4 more

Jailbreaking attacks on the vision modality typically rely on imperceptible adversarial perturbations, whereas attacks on the textual modality are...

8 months ago cs.CL cs.AI cs.CR PDF

Attack HIGH

RL Is a Hammer and LLMs Are Nails: A Simple Reinforcement Learning Recipe for Strong Prompt Injection

Yuxin Wen, Arman Zharmagambetov, Ivan Evtimov +4 more

Prompt injection poses a serious threat to the reliability and safety of LLM agents. Recent defenses against prompt injection, such as Instruction...

8 months ago cs.CR cs.LG PDF

Attack HIGH

Unified Threat Detection and Mitigation Framework (UTDMF): Combating Prompt Injection, Deception, and Bias in Enterprise-Scale Transformers

Santhosh KumarRavindran

The rapid adoption of large language models (LLMs) in enterprise systems exposes vulnerabilities to prompt injection attacks, strategic deception,...

8 months ago cs.CR cs.AI PDF

Attack HIGH

SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations

Buyun Liang, Liangzu Peng, Jinqi Luo +3 more

Large Language Models (LLMs) are increasingly deployed in high-risk domains. However, state-of-the-art LLMs often exhibit hallucinations, raising...

8 months ago cs.CL cs.AI cs.CR PDF

Attack HIGH

VortexPIA: Indirect Prompt Injection Attack against LLMs for Efficient Extraction of User Privacy

Yu Cui, Sicheng Pan, Yifei Liu +2 more

Large language models (LLMs) have been widely deployed in Conversational AIs (CAIs), while exposing privacy and security threats. Recent research...

8 months ago cs.CR PDF

Attack HIGH

AgentTypo: Adaptive Typographic Prompt Injection Attacks against Black-box Multimodal Agents

Yanjie Li, Yiming Cao, Dong Wang +1 more

Multimodal agents built on large vision-language models (LVLMs) are increasingly deployed in open-world settings but remain highly vulnerable to...

8 months ago cs.CR cs.AI PDF

Attack HIGH

Rounding-Guided Backdoor Injection in Deep Learning Model Quantization

Xiangxiang Chen, Peixin Zhang, Jun Sun +2 more

Model quantization is a popular technique for deploying deep learning models on resource-constrained environments. However, it may also introduce...

8 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

Backdoor-Powered Prompt Injection Attacks Nullify Defense Methods

Yulin Chen, Haoran Li, Yuan Sui +2 more

With the development of technology, large language models (LLMs) have dominated the downstream natural language processing (NLP) tasks. However,...

8 months ago cs.CR PDF

Attack HIGH

From Theory to Practice: Evaluating Data Poisoning Attacks and Defenses in In-Context Learning on Social Media Health Discourse

Rabeya Amin Jhuma, Mostafa Mohaimen Akand Faisal

This study explored how in-context learning (ICL) in large language models can be disrupted by data poisoning attacks in the setting of public health...

8 months ago cs.LG cs.CL cs.CR PDF

Attack HIGH

Explainable but Vulnerable: Adversarial Attacks on XAI Explanation in Cybersecurity Applications

Maraz Mia, Mir Mehedi A. Pritom

Explainable Artificial Intelligence (XAI) has aided machine learning (ML) researchers with the power of scrutinizing the decisions of the black-box...

8 months ago cs.CR cs.AI PDF

Attack HIGH

NEXUS: Network Exploration for eXploiting Unsafe Sequences in Multi-Turn LLM Jailbreaks

Javad Rafiei Asl, Sidhant Narula, Mohammad Ghasemigol +2 more

Large Language Models (LLMs) have revolutionized natural language processing but remain vulnerable to jailbreak attacks, especially multi-turn...

8 months ago cs.CR cs.AI PDF

Attack HIGH

LegalSim: Multi-Agent Simulation of Legal Systems for Discovering Procedural Exploits

Sanket Badhe

We present LegalSim, a modular multi-agent simulation of adversarial legal proceedings that explores how AI systems can exploit procedural weaknesses...

8 months ago cs.MA cs.AI cs.CR PDF

Attack HIGH

Untargeted Jailbreak Attack

Xinzhe Huang, Wenjing Hu, Tianhang Zheng +5 more

Existing gradient-based jailbreak attacks on Large Language Models (LLMs) typically optimize adversarial suffixes to align the LLM output with...

8 months ago cs.CR cs.AI PDF

Attack HIGH

External Data Extraction Attacks against Retrieval-Augmented Large Language Models

Yu He, Yifei Chen, Yiming Li +5 more

In recent years, RAG has emerged as a key paradigm for enhancing large language models (LLMs). By integrating externally retrieved information, RAG...

8 months ago cs.CR PDF

Attack HIGH

Attack via Overfitting: 10-shot Benign Fine-tuning to Jailbreak LLMs

Zhixin Xie, Xurui Song, Jun Luo

Despite substantial efforts in safety alignment, recent research indicates that Large Language Models (LLMs) remain highly susceptible to jailbreak...

8 months ago cs.CR PDF

Attack HIGH

A Statistical Method for Attack-Agnostic Adversarial Attack Detection with Compressive Sensing Comparison

Chinthana Wimalasuriya, Spyros Tragoudas

Adversarial attacks present a significant threat to modern machine learning systems. Yet, existing detection methods often lack the ability to detect...

8 months ago cs.CR cs.CV cs.LG PDF

Attack HIGH

ARMs: Adaptive Red-Teaming Agent against Multimodal Models with Plug-and-Play Attacks

Zhaorun Chen, Xun Liu, Mintong Kang +4 more

As vision-language models (VLMs) gain prominence, their multimodal interfaces also introduce new safety vulnerabilities, making the safety evaluation...

8 months ago cs.AI cs.LG PDF

Benchmark HIGH

RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents

Chengquan Guo, Chulin Xie, Yu Yang +6 more

Code agents have gained widespread adoption due to their strong code generation capabilities and integration with code interpreters, enabling dynamic...

8 months ago cs.SE PDF

Tool HIGH

ToolTweak: An Attack on Tool Selection in LLM-based Agents

Jonathan Sneh, Ruomei Yan, Jialin Yu +6 more

As LLMs increasingly power agents that interact with external tools, tool use has become an essential mechanism for extending their capabilities....

8 months ago cs.CR cs.AI PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial