AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 181–200 of 3,023 papers

Tool HIGH

JailbreakOPT: Tool-Assisted Iterative Jailbreak Prompt Optimization

Ge Shi, Jun Yin, Donglin Xie +3 more

Jailbreak attacks expose persistent safety weaknesses in large language models (LLMs), but existing stateless single-turn methods face a trade-off:...

2 weeks ago cs.CR cs.AI PDF

Benchmark MEDIUM

Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models

Malikeh Ehghaghi, Boglárka Ecsedi, Marsha Chechik +1 more

Adversarial robustness evaluations of large language models (LLMs) typically report attack success rate (ASR) under fixed query budgets, implicitly...

2 weeks ago cs.LG cs.AI cs.CR PDF

Attack HIGH

Context-Based Adversarial Attacks on AI Code Generators: Vulnerability Analysis and Implications

Walther A. Del Orbe, John D. Hastings, Varghese Vaidyan

AI-powered code generation systems have transformed software development but introduce critical inference-time security vulnerabilities. This...

2 weeks ago cs.CR cs.SE PDF

Benchmark MEDIUM

It Takes One to Bias Them All: Breaking Bad with One-Shot GRPO

Naihao Deng, Yilun Zhu, Naichen Shi +2 more

Warning: This paper contains several toxic and offensive statements. Modern large language models (LLMs) are typically aligned through large-scale...

2 weeks ago cs.CL PDF

Defense MEDIUM

Comparative Analysis of Inference-Time Defense Methods for Multimodal Large Language Models

Bulat Nutfullin, Vladimir Evgrafov, Dmitry Namiot

Multimodal large language models (MLLMs) now appear in safety-critical applications, but the visual channel leaves them open to adversarial attacks...

2 weeks ago cs.CR PDF

Attack MEDIUM

Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimization

Lena S. Bolliger, Lena A. Jäger

Production LLMs receive instructions from sources with very different levels of trust, yet attend to every token with uniform architectural...

2 weeks ago cs.CR cs.CL PDF

Benchmark HIGH

Securing Code Understanding: Detecting Natural Backdoor Vulnerability in Code Language Models

Yuchen Chen, Weisong Sun, Haocheng Huang +11 more

Code Language Models (CodeLMs) have become integral to software engineering, significantly advancing code intelligence tasks. However, their...

2 weeks ago cs.CR cs.SE PDF

Benchmark HIGH

Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation

Yuchen Ling, Shengcheng Yu, Zhenyu Chen +1 more

Large language model (LLM) agents are rapidly moving from conversational interfaces to software components that plan, invoke tools, maintain memory,...

2 weeks ago cs.CR cs.AI PDF

Attack MEDIUM

MemVenom: Triggered Poisoning of Multimodal Memories in Web Agents

Yv Zhang, Hao Sun, Hao Fang +5 more

External memory has become a core component of modern web agents, enabling long-horizon reasoning through the retrieval of past experiences. However,...

2 weeks ago cs.CR cs.LG PDF

Attack MEDIUM

Improving Adversarial Transferability on Vision-Language Pre-training Models via Surrogate-Specific Bias Correction

Lijia Yu, Jiuxin Cao, Yuchen Qiang +3 more

Adversarial examples reveal vulnerabilities in Vision-Language Pre-training (VLP) models and provide insights for improving robustness. A key...

2 weeks ago cs.CV cs.AI cs.CR PDF

Attack HIGH

Assessing Automated Prompt Injection Attacks in Agentic Environments

David Hofer, Edoardo Debenedetti, Florian Tramèr

Indirect prompt injection poses a critical threat to LLM agents that interact with untrusted external data, yet automated attack methods--proven...

2 weeks ago cs.CR cs.AI PDF

Benchmark MEDIUM

AgentCanary: A Security Evaluation Framework for Autonomous AI Agents in Real Executable Environments

Peiyang Li, Songping Wang, Yi Huang +9 more

Autonomous AI agents have driven the transition from conversation to task execution, shifting security failures from textual deception to system...

2 weeks ago cs.CR PDF

Attack MEDIUM

Advancing the State-of-the-Art in Empirical Privacy Auditing

Nicole Mitchell, Galen Andrew, Arun Ganesh +2 more

Parameter-efficient fine-tuning of large language models (LLMs) can exhibit problematic memorization of individual training examples. Empirical...

2 weeks ago cs.LG cs.AI cs.CL PDF

Attack MEDIUM

Semantic Multi-Agent Intrusion Detection for IoT:Zero-Day and Adversarial Threats with Risk-Aware Reasoning

Saeid Jamshidi

The rapid proliferation of Internet of Things (IoT) devices has enabled unprecedented automation and connectivity, but it has also substantially...

2 weeks ago cs.CR PDF

Benchmark MEDIUM

Game-Theoretic Multi-Agent Control for Robust Contextual Reasoning in LLMs

Saeid Jamshidi, Amin Nikanjam, Arghavan Moradi Dakhel +2 more

Large Language Models (LLMs) in multi-turn interactions maintain evolving context rather than generating isolated responses, making them vulnerable...

2 weeks ago cs.CR cs.MA PDF

Benchmark MEDIUM

Baseline-Free Policy Optimization for Neural Combinatorial Optimization

Carlos S. Sepúlveda, Gonzalo A. Ruz

Neural combinatorial optimization (NCO) trains autoregressive policies to solve routing problems. The standard training algorithm, REINFORCE with a...

2 weeks ago cs.LG cs.AI cs.RO PDF

Benchmark HIGH

Benchmarking and Exploring the Capabilities of LLMs for Attack Investigations

Aniket Anand, Yiwei Hou, Daniel Fields +4 more

This paper presents AuditBench, a new benchmark dataset for evaluating the capabilities of LLMs at investigating security-related system audit logs....

2 weeks ago cs.CR cs.CL PDF

Attack HIGH

Alignment Defends LLMs from Property Inference Attacks

Pengrun Huang, Chhavi Yadav, Ruihan Wu +1 more

Large language models (LLMs) are increasingly fine-tuned on domain-specific datasets that may contain sensitive, dataset-level properties. Recent...

2 weeks ago cs.LG cs.CR PDF

Tool MEDIUM

RadKey: An LLM-Guided RF Backscatter System for Through-Wall Keystroke Inference

Qijun Wang, Chunqi Qian, Huacheng Zeng

In today's digitally connected world, keyboards remain the primary interface for inputting sensitive information, making them a persistent target for...

2 weeks ago cs.CR PDF

Survey MEDIUM

SoK: Colluding Adversaries in Machine Learning Pipelines

Vasisht Duddu, Lipeng He, Asim Waheed +1 more

Machine learning (ML) models are susceptible to various security, privacy, and fairness risks. Adversaries with different characteristics (i.e.,...

2 weeks ago cs.CR cs.LG PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial