AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 401–420 of 1,175 papers

Clear filters

Attack MEDIUM

BadLLM-TG: A Backdoor Defender powered by LLM Trigger Generator

Ruyi Zhang, Heng Gao, Songlei Jian +2 more

Backdoor attacks compromise model reliability by using triggers to manipulate outputs. Trigger inversion can accurately locate these triggers via a...

3 months ago cs.CR cs.AI PDF

Attack HIGH

Activation Surgery: Jailbreaking White-box LLMs without Touching the Prompt

Maël Jenny, Jérémie Dentan, Sonia Vanier +1 more

Most jailbreak techniques for Large Language Models (LLMs) primarily rely on prompt modifications, including paraphrasing, obfuscation, or...

3 months ago cs.CR PDF

Attack HIGH

Safety-Potential Pruning for Enhancing Safety Prompts Against VLM Jailbreaking Without Retraining

Chongxin Li, Hanzhang Wang, Lian Duan

Safety prompts constitute an interpretable layer of defense against jailbreak attacks in vision-language models (VLMs); however, their efficacy is...

3 months ago cs.CV PDF

Attack HIGH

GroupGuard: A Framework for Modeling and Defending Collusive Attacks in Multi-Agent Systems

Yiling Tao, Xinran Zheng, Shuo Yang +2 more

While large language model-based agents demonstrate great potential in collaborative tasks, their interactivity also introduces security...

3 months ago cs.AI PDF

Attack HIGH

Sirens' Whisper: Inaudible Near-Ultrasonic Jailbreaks of Speech-Driven LLMs

Zijian Ling, Pingyi Hu, Xiuyong Gao +6 more

Speech-driven large language models (LLMs) are increasingly accessed through speech interfaces, introducing new security risks via open acoustic...

3 months ago cs.CR cs.AI cs.SD PDF

Attack MEDIUM

MoEKD: Mixture-of-Experts Knowledge Distillation for Robust and High-Performing Compressed Code Models

Md. Abdul Awal, Mrigank Rochan, Chanchal K. Roy

Large language models for code have achieved strong performance across diverse software analytics tasks, yet their real-world adoption remains...

3 months ago cs.SE PDF

Attack MEDIUM

Purifying Generative LLMs from Backdoors without Prior Knowledge or Clean Reference

Jianwei Li, Jung-Eun Kim

Backdoor attacks pose severe security threats to large language models (LLMs), where a model behaves normally under benign inputs but produces...

3 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses

Chenlong Yin, Runpeng Geng, Yanting Wang +1 more

Prompt injection poses serious security risks to real-world LLM applications, particularly autonomous agents. Although many defenses have been...

3 months ago cs.LG cs.CR PDF

Attack HIGH

SLICE: Semantic Latent Injection via Compartmentalized Embedding for Image Watermarking

Zheng Gao, Yifan Yang, Xiaoyu Li +4 more

Watermarking the initial noise of diffusion models has emerged as a promising approach for image provenance, but content-independent noise patterns...

3 months ago cs.CV cs.CR cs.LG PDF

Attack HIGH

Colluding LoRA: A Composite Attack on LLM Safety Alignment

Sihao Ding

We introduce Colluding LoRA (CoLoRA), an attack in which each adapter appears benign and plausibly functional in isolation, yet their linear...

3 months ago cs.CR cs.LG PDF

Attack MEDIUM

Neural Gate: Mitigating Privacy Risks in LVLMs via Neuron-Level Gradient Gating

Xiangkui Cao, Jie Zhang, Meina Kan +2 more

Large Vision-Language Models (LVLMs) have shown remarkable potential across a wide array of vision-language tasks, leading to their adoption in...

3 months ago cs.CV PDF

Attack HIGH

Agent Privilege Separation in OpenClaw: A Structural Defense Against Prompt Injection

Darren Cheng, Wen-Kwang Tsao

Prompt injection remains one of the most practical attack vectors against LLM-integrated applications. We replicate the Microsoft LLMail-Inject...

3 months ago cs.CR cs.AI PDF

Attack HIGH

Accelerating Suffix Jailbreak attacks with Prefix-Shared KV-cache

Xinhai Wang, Shaopeng Fu, Shu Yang +3 more

Suffix jailbreak attacks serve as a systematic method for red-teaming Large Language Models (LLMs) but suffer from prohibitive computational costs,...

3 months ago cs.CR cs.AI PDF

Attack HIGH

SpectralGuard: Detecting Memory Collapse Attacks in State Space Models

Davi Bonetto

State Space Models (SSMs) such as Mamba achieve linear-time sequence processing through input-dependent recurrence, but this mechanism introduces a...

3 months ago cs.LG cs.CR PDF

Attack HIGH

CLASP: Defending Hybrid Large Language Models Against Hidden State Poisoning Attacks

Alexandre Le Mercier, Thomas Demeester, Chris Develder

State space models (SSMs) like Mamba have gained significant traction as efficient alternatives to Transformers, achieving linear complexity while...

3 months ago cs.CL PDF

Attack MEDIUM

EmbTracker: Traceable Black-box Watermarking for Federated Language Models

Haodong Zhao, Jinming Hu, Yijie Bai +6 more

Federated Language Model (FedLM) allows a collaborative learning without sharing raw data, yet it introduces a critical vulnerability, as every...

3 months ago cs.CR PDF

Attack HIGH

The Mirror Design Pattern: Strict Data Geometry over Model Scale for Prompt Injection Detection

J Alex Corll

Prompt injection defenses are often framed as semantic understanding problems and delegated to increasingly large neural detectors. For the first...

3 months ago cs.CR cs.AI PDF

Attack HIGH

Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover

Indranil Halder, Annesya Banerjee, Cengiz Pehlevan

Adversarial attacks can reliably steer safety-aligned large language models toward unsafe behavior. Empirically, we find that adversarial...

3 months ago cs.LG cs.AI PDF

Attack HIGH

Enhancing Network Intrusion Detection Systems: A Multi-Layer Ensemble Approach to Mitigate Adversarial Attacks

Nasim Soltani, Shayan Nejadshamsi, Zakaria Abou El Houda +4 more

Adversarial examples can represent a serious threat to machine learning (ML) algorithms. If used to manipulate the behaviour of ML-based Network...

3 months ago cs.CR cs.AI PDF

Attack HIGH

Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems

Scott Thornton

Retrieval-Augmented Generation (RAG) systems extend large language models (LLMs) with external knowledge sources but introduce new attack surfaces...

3 months ago cs.CR cs.AI cs.LG PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial