AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 221–240 of 562 papers

Clear filters

Benchmark LOW

DreamKG: A KG-Augmented Conversational System for People Experiencing Homelessness

Javad M Alizadeh, Genhui Zheng, Chiu C Tan +7 more

People experiencing homelessness (PEH) face substantial barriers to accessing timely, accurate information about community services. DreamKG...

2 months ago cs.AI PDF

Benchmark MEDIUM

RLSpoofer: A Lightweight Evaluator for LLM Watermark Spoofing Resilience

Hanbo Huang, Xuan Gong, Yiran Zhang +2 more

Large language model (LLM) watermarking has emerged as a promising approach for detecting and attributing AI-generated text, yet its robustness to...

2 months ago cs.CR PDF

Benchmark LOW

From Translation to Superset: Benchmark-Driven Evolution of a Production AI Agent from Rust to Python

Jinhua Wang, Biswa Sengupta

Cross-language migration of large software systems is a persistent engineering challenge, particularly when the source codebase evolves rapidly. We...

2 months ago cs.SE cs.AI PDF

Benchmark MEDIUM

RedShell: A Generative AI-Based Approach to Ethical Hacking

Ricardo Bessa, Rui Claro, João Trindade +1 more

The application of Machine Learning techniques in code generation is now a common practice for most developers. Tools such as ChatGPT from OpenAI...

2 months ago cs.CR PDF

Benchmark LOW

Beyond RAG for Cyber Threat Intelligence: A Systematic Evaluation of Graph-Based and Agentic Retrieval

Dzenan Hamzic, Florian Skopik, Max Landauer +2 more

Cyber threat intelligence (CTI) analysts must answer complex questions over large collections of narrative security reports. Retrieval-augmented...

2 months ago cs.AI cs.CR PDF

Benchmark MEDIUM

OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models

Xiaomeng Hu, Yinger Zhang, Fei Huang +7 more

AI agents are expected to perform professional work across hundreds of occupational domains (from emergency department triage to nuclear reactor...

2 months ago cs.CL PDF

Benchmark MEDIUM

DuCodeMark: Dual-Purpose Code Dataset Watermarking via Style-Aware Watermark-Poison Design

Yuchen Chen, Yuan Xiao, Chunrong Fang +2 more

The proliferation of large language models for code (CodeLMs) and open-source contributions has heightened concerns over unauthorized use of source...

2 months ago cs.CR PDF

Benchmark LOW

OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks

Wenbo Hu, Xin Chen, Yan Gao-Tian +3 more

Group Relative Policy Optimization (GRPO) has emerged as the de facto Reinforcement Learning (RL) objective driving recent advancements in Multimodal...

2 months ago cs.CV cs.AI cs.CL PDF

Benchmark HIGH

PIArena: A Platform for Prompt Injection Evaluation

Runpeng Geng, Chenlong Yin, Yanting Wang +2 more

Prompt injection attacks pose serious security risks across a wide range of real-world applications. While receiving increasing attention, the...

2 months ago cs.CR cs.AI cs.CL PDF

Benchmark MEDIUM

Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing

Wenhao Yuan, Chenchen Lin, Jian Chen +3 more

In large language model (LLM) agents, reasoning trajectories are treated as reliable internal beliefs for guiding actions and updating memory....

2 months ago cs.AI cs.CL PDF

Benchmark MEDIUM

ADAG: Automatically Describing Attribution Graphs

Aryaman Arora, Zhengxuan Wu, Jacob Steinhardt +1 more

In language model interpretability research, \textbf{circuit tracing} aims to identify which internal features causally contributed to a particular...

2 months ago cs.CL PDF

Benchmark MEDIUM

ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training

Yu Liang, Liangxin Liu, Longzheng Wang +5 more

Generative reward models (GRMs) have emerged as a promising approach for aligning Large Language Models (LLMs) with human preferences by offering...

2 months ago cs.AI cs.CL cs.LG PDF

Benchmark LOW

M-ArtAgent: Evidence-Based Multimodal Agent for Implicit Art Influence Discovery

Hanyi Liu, Zhonghao Jiu, Minghao Wang +2 more

Implicit artistic influence, although visually plausible, is often undocumented and thus poses a historically constrained attribution problem:...

2 months ago cs.AI PDF

Benchmark MEDIUM

Validated Intent Compilation for Constrained Routing in LEO Mega-Constellations

Yuanhang Li

Operating LEO mega-constellations requires translating high-level operator intents ("reroute financial traffic away from polar links under 80 ms")...

2 months ago cs.CR cs.AI PDF

Benchmark HIGH

PoC-Adapt: Semantic-Aware Automated Vulnerability Reproduction with LLM Multi-Agents and Reinforcement Learning-Driven Adaptive Policy

Phan The Duy, Nguyen Viet Duy, Khoa Ngo-Khanh +2 more

While recent approaches leverage large language models (LLMs) and multi-agent pipelines to automatically generate proof-of-concept (PoC) exploits...

2 months ago cs.CR PDF

Benchmark HIGH

Uncovering Linguistic Fragility in Vision-Language-Action Models via Diversity-Aware Red Teaming

Baoshun Tong, Haoran He, Ling Pan +2 more

Vision-Language-Action (VLA) models have achieved remarkable success in robotic manipulation. However, their robustness to linguistic nuances remains...

2 months ago cs.RO cs.CV PDF

Benchmark MEDIUM

Compiled AI: Deterministic Code Generation for LLM-Based Workflow Automation

Geert Trooskens, Aaron Karlsberg, Anmol Sharma +6 more

We study compiled AI, a paradigm in which large language models generate executable code artifacts during a compilation phase, after which workflows...

2 months ago cs.SE cs.AI PDF

Benchmark LOW

LiveFact: A Dynamic, Time-Aware Benchmark for LLM-Driven Fake News Detection

Cheng Xu, Changhong Jin, Yingjie Niu +5 more

The rapid development of Large Language Models (LLMs) has transformed fake news detection and fact-checking tasks from simple classification to...

2 months ago cs.CL cs.AI PDF

Benchmark LOW

Forgetting to Witness: Efficient Federated Unlearning and Its Visible Evaluation

Houzhe Wang, Xiaojie Zhu, Chi Chen

With the increasing importance of data privacy and security, federated unlearning has emerged as a novel research field dedicated to ensuring that...

2 months ago cs.LG cs.CR PDF

Benchmark LOW

Discovering Failure Modes in Vision-Language Models using RL

Kanishk Jain, Qian Yang, Shravan Nayak +3 more

Vision-language Models (VLMs), despite achieving strong performance on multimodal benchmarks, often misinterpret straightforward visual concepts that...

2 months ago cs.CV cs.AI PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial