AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 201–220 of 748 papers

Clear filters

Attack HIGH

Reading Between the Pixels: An Inscriptive Jailbreak Attack on Text-to-Image Models

Zonghao Ying, Haowen Dai, Lianyu Hu +5 more

Modern text-to-image (T2I) models can now render legible, paragraph-length text, enabling a fundamentally new class of misuse. We identify and...

2 months ago cs.CV PDF

Attack HIGH

Reading Between the Pixels: An Inscriptive Jailbreak Attack on Text-to-Image Models

Zonghao Ying, Haowen Dai, Lianyu Hu +5 more

Modern text-to-image (T2I) models can now render legible, paragraph-length text, enabling a fundamentally new class of misuse. We identify and...

2 months ago cs.CV PDF

Attack HIGH

Stealthy and Adjustable Text-Guided Backdoor Attacks on Multimodal Pretrained Models

Yiyang Zhang, Chaojian Yu, Ziming Hong +4 more

Multimodal pretrained models are vulnerable to backdoor attacks, yet most existing methods rely on visual or multimodal triggers, which are...

2 months ago cs.CR cs.LG PDF

Attack HIGH

Do No Harm: Exposing Hidden Vulnerabilities of LLMs via Persona-based Client Simulation Attack in Psychological Counseling

Qingyang Xu, Yaling Shen, Stephanie Fong +7 more

The increasing use of large language models (LLMs) in mental healthcare raises safety concerns in high-stakes therapeutic interactions. A key...

2 months ago cs.CL PDF

Attack HIGH

CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks

Siyuan Li, Zehao Liu, Xi Lin +6 more

As Large Language Models (LLMs) are increasingly deployed in complex applications, their vulnerability to adversarial attacks raises urgent safety...

2 months ago cs.CR cs.AI PDF

Attack HIGH

SelfGrader: Stable Jailbreak Detection for Large Language Models using Token-Level Logits

Zikai Zhang, Rui Hu, Olivera Kotevska +1 more

Large Language Models (LLMs) are powerful tools for answering user queries, yet they remain highly vulnerable to jailbreak attacks. Existing...

2 months ago cs.CR cs.AI PDF

Attack HIGH

No Attacker Needed: Unintentional Cross-User Contamination in Shared-State LLM Agents

Tiankai Yang, Jiate Li, Yi Nian +5 more

LLM-based agents increasingly operate across repeated sessions, maintaining task states to ensure continuity. In many deployments, a single agent...

2 months ago cs.CL cs.AI cs.CR PDF

Attack HIGH

AgentWatcher: A Rule-based Prompt Injection Monitor

Yanting Wang, Wei Zou, Runpeng Geng +1 more

Large language models (LLMs) and their applications, such as agents, are highly vulnerable to prompt injection attacks. State-of-the-art prompt...

2 months ago cs.CR PDF

Attack HIGH

When Safe Models Merge into Danger: Exploiting Latent Vulnerabilities in LLM Fusion

Jiaqing Li, Zhibo Zhang, Shide Zhou +3 more

Model merging has emerged as a powerful technique for combining specialized capabilities from multiple fine-tuned LLMs without additional training...

2 months ago cs.CR PDF

Attack HIGH

Adversarial Prompt Injection Attack on Multimodal Large Language Models

Meiwen Ding, Song Xia, Chenqi Kong +1 more

Although multimodal large language models (MLLMs) are increasingly deployed in real-world applications, their instruction-following behavior leaves...

2 months ago cs.CV cs.AI PDF

Attack HIGH

Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning

Kavindu Herath, Joshua Zhao, Saurabh Bagchi

Backdoor attacks on federated learning (FL) are most often evaluated with synthetic corner patches or out-of-distribution (OOD) patterns that are...

2 months ago cs.CR cs.AI cs.CV PDF

Attack HIGH

Dummy-Aware Weighted Attack (DAWA): Breaking the Safe Sink in Dummy Class Defenses

Yunrui Yu, Xuxiang Feng, Pengda Qin +5 more

Adversarial robustness evaluation faces a critical challenge as new defense paradigms emerge that can exploit limitations in existing assessment...

2 months ago cs.LG cs.CR PDF

Attack HIGH

Trojan-Speak: Bypassing Constitutional Classifiers with No Jailbreak Tax via Adversarial Finetuning

Bilgehan Sel, Xuanli He, Alwin Peng +2 more

Fine-tuning APIs offered by major AI providers create new attack surfaces where adversaries can bypass safety measures through targeted fine-tuning....

2 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

\texttt{ReproMIA}: A Comprehensive Analysis of Model Reprogramming for Proactive Membership Inference Attacks

Chihan Huang, Huaijin Wang, Shuai Wang

The pervasive deployment of deep learning models across critical domains has concurrently intensified privacy concerns due to their inherent...

2 months ago cs.LG cs.CR PDF

Attack HIGH

XSPA: Crafting Imperceptible X-Shaped Sparse Adversarial Perturbations for Transferable Attacks on VLMs

Chengyin Hu, Jiaju Han, Xuemeng Sun +6 more

Vision-language models (VLMs) rely on a shared visual-textual representation space to perform tasks such as zero-shot classification, image...

2 months ago cs.CV PDF

Attack HIGH

Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers

Haochuan Kevin Wang

We present a stage-decomposed analysis of prompt injection attacks against five frontier LLM agents. Prior work measures task-level attack success...

3 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

Shape and Substance: Dual-Layer Side-Channel Attacks on Local Vision-Language Models

Eyal Hadad, Mordechai Guri

On-device Vision-Language Models (VLMs) promise data privacy via local execution. However, we show that the architectural shift toward Dynamic...

3 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

On the Vulnerability of Deep Automatic Modulation Classifiers to Explainable Backdoor Threats

Younes Salmi, Hanna Bogucka

Deep learning (DL) has been widely studied for assisting applications of modern wireless communications. One of the applications is automatic...

3 months ago cs.CR PDF

Attack HIGH

Physical Backdoor Attack Against Deep Learning-Based Modulation Classification

Younes Salmi, Hanna Bogucka

Deep Learning (DL) has become a key technology that assists radio frequency (RF) signal classification applications, such as modulation...

3 months ago cs.CR PDF

Attack HIGH

Mitigating Evasion Attacks in Fog Computing Resource Provisioning Through Proactive Hardening

Younes Salmi, Hanna Bogucka

This paper investigates the susceptibility to model integrity attacks that overload virtual machines assigned by the k-means algorithm used for...

3 months ago cs.CR cs.LG PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial