AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 661–680 of 748 papers

Clear filters

Attack HIGH

Injection, Attack and Erasure: Revocable Backdoor Attacks via Machine Unlearning

Baogang Song, Dongdong Zhao, Jianwen Xiang +2 more

Backdoor attacks pose a persistent security risk to deep neural networks (DNNs) due to their stealth and durability. While recent research has...

8 months ago cs.CR cs.AI PDF

Attack HIGH

RAID: Refusal-Aware and Integrated Decoding for Jailbreaking LLMs

Tuan T. Nguyen, John Le, Thai T. Vu +2 more

Large language models (LLMs) achieve impressive performance across diverse tasks yet remain vulnerable to jailbreak attacks that bypass safety...

8 months ago cs.CL PDF

Attack HIGH

PromptLocate: Localizing Prompt Injection Attacks

Yuqi Jia, Yupei Liu, Zedian Shao +2 more

Prompt injection attacks deceive a large language model into completing an attacker-specified task instead of its intended task by contaminating its...

8 months ago cs.CR cs.AI PDF

Attack HIGH

Unveiling the Vulnerability of Graph-LLMs: An Interpretable Multi-Dimensional Adversarial Attack on TAGs

Bowen Fan, Zhilin Guo, Xunkai Li +5 more

Graph Neural Networks (GNNs) have become a pivotal framework for modeling graph-structured data, enabling a wide range of applications from social...

8 months ago cs.LG PDF

Attack HIGH

HackWorld: Evaluating Computer-Use Agents on Exploiting Web Application Vulnerabilities

Xiaoxue Ren, Penghao Jiang, Kaixin Li +6 more

Web applications are prime targets for cyberattacks as gateways to critical services and sensitive data. Traditional penetration testing is costly...

8 months ago cs.CR cs.CL PDF

Attack HIGH

Fairness-Constrained Optimization Attack in Federated Learning

Harsh Kasyap, Minghong Fang, Zhuqing Liu +2 more

Federated learning (FL) is a privacy-preserving machine learning technique that facilitates collaboration among participants across demographics. FL...

8 months ago cs.LG cs.CR PDF

Attack HIGH

LLMAtKGE: Large Language Models as Explainable Attackers against Knowledge Graph Embeddings

Ting Li, Yang Yang, Yipeng Yu +3 more

Adversarial attacks on knowledge graph embeddings (KGE) aim to disrupt the model's ability of link prediction by removing or inserting triples. A...

8 months ago cs.CL cs.CR PDF

Attack HIGH

Attacks by Content: Automated Fact-checking is an AI Security Issue

Michael Schlichtkrull

When AI agents retrieve and reason over external documents, adversaries can manipulate the data they receive to subvert their behaviour. Previous...

8 months ago cs.CL cs.AI PDF

Attack HIGH

RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation

Vasilije Stambolic, Aritra Dhar, Lukas Cavigelli

Retrieval-Augmented Generation (RAG) increases the reliability and trustworthiness of the LLM response and reduces hallucination by eliminating the...

8 months ago cs.CR cs.AI PDF

Attack HIGH

DropVLA: An Action-Level Backdoor Attack on Vision-Language-Action Models

Zonghuan Xu, Jiayu Li, Yunhan Zhao +3 more

Vision-Language-Action (VLA) models map multimodal perception and language instructions to executable robot actions, making them particularly...

8 months ago cs.CR cs.AI cs.RO PDF

Attack HIGH

SASER: Stego attacks on open-source LLMs

Ming Tan, Wei Li, Hu Tao +4 more

Open-source large language models (LLMs) have demonstrated considerable dominance over proprietary LLMs in resolving neural processing tasks, thanks...

8 months ago cs.CR cs.AI PDF

Attack HIGH

ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test

Guan-Yan Yang, Tzu-Yu Cheng, Ya-Wen Teng +2 more

The integration of Large Language Models (LLMs) into computer applications has introduced transformative capabilities but also significant security...

8 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

MetaBreak: Jailbreaking Online LLM Services via Special Token Manipulation

Wentian Zhu, Zhen Xiang, Wei Niu +1 more

Unlike regular tokens derived from existing text corpora, special tokens are artificially created to annotate structured conversations during the...

8 months ago cs.CR cs.AI PDF

Attack HIGH

ADMIT: Few-shot Knowledge Poisoning Attacks on RAG-based Fact Checking

Yutao Wu, Xiao Liu, Yinghui Li +5 more

Knowledge poisoning poses a critical threat to Retrieval-Augmented Generation (RAG) systems by injecting adversarial content into knowledge bases,...

8 months ago cs.CL cs.AI cs.CR PDF

Attack HIGH

A Systematic Study on Generating Web Vulnerability Proof-of-Concepts Using Large Language Models

Mengyao Zhao, Kaixuan Li, Lyuye Zhang +4 more

Recent advances in Large Language Models (LLMs) have brought remarkable progress in code understanding and reasoning, creating new opportunities and...

8 months ago cs.SE PDF

Attack HIGH

Adversarial Attacks on Downstream Weather Forecasting Models: Application to Tropical Cyclone Trajectory Prediction

Yue Deng, Francisco Santos, Pang-Ning Tan +1 more

Deep learning based weather forecasting (DLWF) models leverage past weather observations to generate future forecasts, supporting a wide range of...

8 months ago cs.LG cs.CR stat.ML PDF

Attack HIGH

Text Prompt Injection of Vision Language Models

Ruizhe Zhu

The widespread application of large vision language models has significantly raised safety concerns. In this project, we investigate text prompt...

8 months ago cs.CL cs.CV PDF

Attack HIGH

Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Mikhail Terekhov, Alexander Panfilov, Daniil Dzenhaliou +4 more

AI control protocols serve as a defense mechanism to stop untrusted LLM agents from causing harm in autonomous settings. Prior work treats this as a...

8 months ago cs.LG cs.AI cs.CR PDF

Attack HIGH

Provable Watermarking for Data Poisoning Attacks

Yifan Zhu, Lijia Yu, Xiao-Shan Gao

In recent years, data poisoning attacks have been increasingly designed to appear harmless and even beneficial, often with the intention of verifying...

8 months ago cs.CR cs.LG PDF

Attack HIGH

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections

Milad Nasr, Nicholas Carlini, Chawin Sitawarin +11 more

How should we evaluate the robustness of language model defenses? Current defenses against jailbreaks and prompt injections (which aim to prevent an...

8 months ago cs.LG cs.CR PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial