AI Security Research

AI Threat Alert indexes 3,037+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,037
Attack

1,183
Benchmark

868
Defense

410
Tool

319
Survey

177

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 141–160 of 868 papers

Clear filters

Benchmark HIGH

LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments

Chiyu Zhang, Huiqin Yang, Bendong Jiang +8 more

The rapid proliferation of LLM-based autonomous agents in real operating system environments introduces a new category of safety risk beyond content...

1 months ago cs.CR cs.CL PDF

Benchmark LOW

The Bystander Effect in Multi-Agent Reasoning: Quantifying Cognitive Loafing in Collaborative Interactions

Dahlia Shehata, Ming Li

Multi-agent systems (MAS) assume that collaborating inherently improves Large Language Model (LLM) reasoning. We challenge this by demonstrating that...

1 months ago cs.MA cs.AI PDF

Benchmark HIGH

Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw

Hongwei Yao, Yiming Liu, Yiling He +1 more

Agentic language-model systems increasingly rely on mutable execution contexts, including files, memory, tools, skills, and auxiliary artifacts,...

1 months ago cs.CR cs.AI PDF

Benchmark LOW

How Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialogue

Hui Lu, Xueyuan Chen, Huimeng Wang +4 more

Full-duplex spoken dialogue requires a model to keep listening while generating its own spoken response. This is challenging for large language...

1 months ago cs.CL eess.AS PDF

Benchmark MEDIUM

Benchmarking Safety Risks of Knowledge-Intensive Reasoning under Malicious Knowledge Editing

Qinghua Mao, Xi Lin, Jinze Gu +3 more

Large language models (LLMs) increasingly rely on knowledge editing to support knowledge-intensive reasoning, but this flexibility also introduces...

1 months ago cs.AI cs.CR PDF

Benchmark MEDIUM

The Cartesian Shortcut: Re-evaluate Vision Reasoning in Polar Coordinate Space

Xia Hu, Zhenrui Yue, Brian Potetz +4 more

As current Multimodal Large Language Models rapidly saturate canonical visual reasoning benchmarks, a key question emerges: do these strong scores...

1 months ago cs.CV cs.AI PDF

Benchmark MEDIUM

MedMeta: A Benchmark for LLMs in Synthesizing Meta-Analysis Conclusion from Medical Studies

Huy Hoang Ha, Benoit Favre, Francois Portet

Large language models (LLMs) have saturated standard medical benchmarks that test factual recall, yet their ability to perform higher-order...

1 months ago cs.CL cs.AI PDF

Benchmark MEDIUM

Modeling Implicit Conflict Monitoring Mechanisms against Stereotypes in LLMs

Jingshen Zhang, Bo Wang, Yanlin Fu +4 more

In this paper, we study an emergent self-debiasing mechanisms against stereotypical content in Large Language Models (LLMs). Unlike traditional...

1 months ago cs.SI PDF

Benchmark MEDIUM

Don't Click That: Teaching Web Agents to Resist Deceptive Interfaces

Yilin Zhang, Yingkai Hua, Chunyu Wei +2 more

Vision-language model (VLM) based web agents demonstrate impressive autonomous GUI interaction but remain vulnerable to deceptive interface elements....

1 months ago cs.AI cs.CR PDF

Benchmark HIGH

How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation

Shai Feldman, Yaniv Romano

Evaluating and predicting the performance of large language models (LLMs) in multi-turn conversational settings is critical yet computationally...

1 months ago cs.LG PDF

Benchmark HIGH

Autonomous Adversary: Red-Teaming in the age of LLM

Mohammad Mamun, Mohamed Gaber, Scott Buffett +1 more

Language Model Agents (LMAs) are emerging as a powerful primitive for augmenting red-team operations. They can support attack planning, adversary...

1 months ago cs.CR PDF

Benchmark MEDIUM

Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation

Di Lu, Bo Zhang, Xiyuan Li +5 more

Self-hosted computer-use agents (SHCUAs), such as OpenClaw, combine natural-language interaction with direct access to host-side resources, including...

1 months ago cs.CR PDF

Benchmark MEDIUM

PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts

Qinfeng Li, Yuntai Bao, Jianghui Hu +5 more

LLM agents rely on prompts to implement task-specific capabilities based on foundation LLMs, making agent prompts valuable intellectual property....

1 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

LCC-LLM: Leveraging Code-Centric Large Language Models for Malware Attribution

Christopher G. Pedraza Pohlenz, Hassan Jalil Hadi, Ali Hassan +1 more

LLMs are increasingly explored for malware analysis; however, current LLM-based malware attribution remains limited by unsupported indicators and...

1 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

DataDignity: Training Data Attribution for Large Language Models

Xiaomin Li, Andrzej Banburski-Fahey, Jaron Lanier

Auditing language-model outputs often requires more than judging correctness: an auditor may need to identify which source document most likely...

1 months ago cs.AI PDF

Benchmark MEDIUM

XL-SafetyBench: A Country-Grounded Cross-Cultural Benchmark for LLM Safety and Cultural Sensitivity

Dasol Choi, Eugenia Kim, Jaewon Noh +14 more

Current LLM safety benchmarks are predominantly English-centric and often rely on translation, failing to capture country-specific harms. Moreover,...

1 months ago cs.CL cs.AI PDF

Benchmark LOW

The Cost of Context: Mitigating Textual Bias in Multimodal Retrieval-Augmented Generation

Hoin Jung, Xiaoqian Wang

While Multimodal Large Language Models (MLLMs) are increasingly integrated with Retrieval-Augmented Generation (RAG) to mitigate hallucinations, the...

1 months ago cs.CL cs.CV cs.LG PDF

Benchmark MEDIUM

AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use

Chenglin Yang

Modern AI agents execute real-world side effects through tool calls such as file operations, shell commands, HTTP requests, and database queries. A...

1 months ago cs.AI cs.CR PDF

Benchmark MEDIUM

Graph Reconstruction from Differentially Private GNN Explanations

Rishi Raj Sahoo, Jyotirmaya Shivottam, Subhankar Mishra

Regulatory frameworks such as GDPR increasingly require that ML predictions be accompanied by post-hoc explanations, even when raw data and trained...

1 months ago cs.LG cs.CR PDF

Benchmark MEDIUM

DECKER: Domain-invariant Embedding for Cross-Keyboard Extraction and Recognition

Bikrant Bikram Pratap Maurya, Nitin Choudhury, Daksh Agarwal +1 more

Acoustic side-channel attacks (ASCA) on keyboards pose a significant security risk, as keystrokes can be inferred from typing acoustics, revealing...

1 months ago cs.CR cs.SD PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,037+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial