AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 261–280 of 562 papers

Clear filters

Benchmark LOW

From Weights to Concepts: Data-Free Interpretability of CLIP via Singular Vector Decomposition

Francesco Gentile, Nicola Dall'Asen, Francesco Tonini +3 more

As vision-language models are deployed at scale, understanding their internal mechanisms becomes increasingly critical. Existing interpretability...

3 months ago cs.CV PDF

Benchmark MEDIUM

Environment-Grounded Multi-Agent Workflow for Autonomous Penetration Testing

Michael Somma, Markus Großpointner, Paul Zabalegui +2 more

The increasing complexity and interconnectivity of digital infrastructures make scalable and reliable security assessment methods essential. Robotic...

3 months ago cs.RO cs.AI PDF

Benchmark MEDIUM

Walma: Learning to See Memory Corruption in WebAssembly

Oussama Draissi, Mark Günzel, Ahmad-Reza Sadeghi +1 more

WebAssembly's (Wasm) monolithic linear memory model facilitates memory corruption attacks that can escalate to cross-site scripting in browsers or go...

3 months ago cs.CR cs.LG PDF

Benchmark MEDIUM

Do World Action Models Generalize Better than VLAs? A Robustness Study

Zhanguang Zhang, Zhiyuan Li, Behnam Rahmati +10 more

Robot action planning in the real world is challenging as it requires not only understanding the current state of the environment but also predicting...

3 months ago cs.RO PDF

Benchmark MEDIUM

SecureBreak -- A dataset towards safe and secure models

Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera

Large language models are becoming pervasive core components in many real-world applications. As a consequence, security alignment represents a...

3 months ago cs.CR cs.AI cs.CL PDF

Benchmark LOW

Mirage The Illusion of Visual Understanding

Mohammad Asadi, Jack W. O'Sullivan, Fang Cao +5 more

Multimodal AI systems have achieved remarkable performance across a broad range of real-world tasks, yet the mechanisms underlying visual-language...

3 months ago cs.AI PDF

Benchmark LOW

Adaptive Robust Estimator for Multi-Agent Reinforcement Learning

Zhongyi Li, Wan Tian, Jingyu Chen +8 more

Multi-agent collaboration has emerged as a powerful paradigm for enhancing the reasoning capabilities of large language models, yet it suffers from...

3 months ago cs.AI PDF

Benchmark LOW

WARBENCH: A Comprehensive Benchmark for Evaluating LLMs in Military Decision-Making

Zongjie Li, Chaozheng Wang, Yuchong Xie +2 more

Large Language Models are increasingly being considered for deployment in safety-critical military applications. However, current benchmarks suffer...

3 months ago cs.CY cs.AI PDF

Benchmark LOW

SkillProbe: Security Auditing for Emerging Agent Skill Marketplaces via Multi-Agent Collaboration

Zihan Guo, Zhiyu Chen, Xiaohang Nie +3 more

With the rapid evolution of Large Language Model (LLM) agent ecosystems, centralized skill marketplaces have emerged as pivotal infrastructure for...

3 months ago cs.CR cs.SE PDF

Benchmark LOW

BenchBench: Benchmarking Automated Benchmark Generation

Yandan Zheng, Haoran Luo, Zhenghong Lin +2 more

Benchmarks are the de facto standard for tracking progress in large language models (LLMs), yet static test sets can rapidly saturate, become...

3 months ago cs.CL PDF

Benchmark HIGH

AEGIS: From Clues to Verdicts -- Graph-Guided Deep Vulnerability Reasoning via Dialectics and Meta-Auditing

Sen Fang, Weiyuan Ding, Zhezhen Cao +2 more

Large Language Models (LLMs) are increasingly adopted for vulnerability detection, yet their reasoning remains fundamentally unsound. We identify a...

3 months ago cs.SE cs.AI cs.CR PDF

Benchmark MEDIUM

Unveiling the Security Risks of Federated Learning in the Wild: From Research to Practice

Jiahao Chen, Zhiming Zhao, Yuwen Pu +4 more

Federated learning (FL) has attracted substantial attention in both academia and industry, yet its practical security posture remains poorly...

3 months ago cs.CR PDF

Benchmark MEDIUM

LJ-Bench: Ontology-Based Benchmark for U.S. Crime

Hung Yun Tseng, Wuzhen Li, Blerina Gkotse +1 more

The potential of Large Language Models (LLMs) to provide harmful information remains a significant concern due to the vast breadth of illegal queries...

3 months ago cs.LG PDF

Benchmark MEDIUM

The production of meaning in the processing of natural language

Christopher J. Agostino, Quan Le Thien, Nayan D'Souza +1 more

Understanding the fundamental mechanisms governing the production of meaning in the processing of natural language is critical for designing safe,...

3 months ago cs.CL cs.AI cs.HC PDF

Benchmark MEDIUM

Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance

Fazhong Liu, Zhuoyan Chen, Tu Lan +6 more

Autonomous coding agents are increasingly integrated into software development workflows, offering capabilities that extend beyond code suggestion to...

3 months ago cs.CR cs.AI PDF

Benchmark LOW

What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time

Dong Yan, Jian Liang, Yanbo Wang +3 more

Test-Time Reinforcement Learning (TTRL) enables Large Language Models (LLMs) to enhance reasoning capabilities on unlabeled test streams by deriving...

3 months ago cs.LG cs.AI PDF

Benchmark LOW

Box Maze: A Process-Control Architecture for Reliable LLM Reasoning

Zou Qiang

Large language models (LLMs) demonstrate strong generative capabilities but remain vulnerable to hallucination and unreliable reasoning under...

3 months ago cs.AI cs.CL PDF

Benchmark MEDIUM

Functional Subspace Watermarking for Large Language Models

Zikang Ding, Junhao Li, Suling Wu +3 more

Model watermarking utilizes internal representations to protect the ownership of large language models (LLMs). However, these features inevitably...

3 months ago cs.CR cs.AI PDF

Benchmark LOW

The Validity Gap in Health AI Evaluation: A Cross-Sectional Analysis of Benchmark Composition

Alvin Rajkomar, Pavan Sudarshan, Angela Lai +1 more

Background: Clinical trials rely on transparent inclusion criteria to ensure generalizability. In contrast, benchmarks validating health-related...

3 months ago cs.AI PDF

Benchmark HIGH

Machine Learning for Network Attacks Classification and Statistical Evaluation of Machine Learning for Network Attacks Classification and Adversarial Learning Methodologies for Synthetic Data Generation

Iakovos-Christos Zarkadis, Christos Douligeris

Supervised detection of network attacks has always been a critical part of network intrusion detection systems (NIDS). Nowadays, in a pivotal time...

3 months ago cs.CR cs.AI stat.AP PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial