AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 321–340 of 1,983 papers

Clear filters

Defense LOW

Real Images, Worse Judgments: Evaluating Vision-Language Models on Concreteness and Imagery

Yifan Jiang, Ruoxi Ning, Sheng Yao +1 more

Visual inputs are often assumed to improve language understanding in multimodal models. We examine this assumption by asking whether vision-language...

1 months ago cs.CL PDF

Benchmark MEDIUM

Grounded Cache Routing for Retrieval-Augmented Generation: When Is It Safe to Reuse an Answer?

Syed Huma Shah

Modern retrieval-augmented generation(RAG) deployments increasingly rely on caching to reduce token cost and time-to-first-token(TTFT). Prefix-level...

1 months ago cs.CR cs.AI cs.CL PDF

Tool MEDIUM

HARP: Measuring Harm Amplification in Multi-Agent LLM Systems

Md Hafizur Rahman, Zafaryab Haider, Tanzim Mahfuz +1 more

Multi-agent LLM systems decompose workflows across agents, tools, shared context, memory, and decision gates. This modularity improves...

1 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

BAIT: Boundary-Guided Disclosure Escalation via Self-Conditioned Reasoning

Xuan Luo, Yue Wang, Geng Tu +2 more

In this work, we propose BAIT (Boundary-Aware Iterative Trap), a three-step jailbreak framework that approaches malicious goals through internal...

1 months ago cs.CR cs.CL PDF

Benchmark HIGH

Prompt Injection Detection is Regime-Dependent: A Deployment-Aware Evaluation with Interpretable Structural Signals

Akindoyin Akinrele, Shreyank N Gowda

Prompt injection poses a critical threat to the safe deployment of large language models, yet existing detection approaches are typically evaluated...

1 months ago cs.CL cs.CR PDF

Survey MEDIUM

On the Robustness of Machine Unlearning for Vision-Language Models

Yujie Lin, Kaidi Jia, Jiayao Ma +2 more

Vision-language models (VLMs) may memorize undesirable information from training data, motivating growing interest in machine unlearning. In this...

1 months ago cs.CV PDF

Benchmark MEDIUM

AlbanianLLMSafety: A Safety Evaluation Dataset for Large Language Models in Albanian

Wajdi Zaghouani, Kholoud K. Aldous, Isra Fejzullaj

Safety evaluation of Large Language Models (LLMs) has largely focused on high-resource languages, leaving low-resource languages critically...

1 months ago cs.CL PDF

Benchmark LOW

Persistent AI Agents in Academic Research: A Single-Investigator Implementation Case Study

Anas H. Alzahrani

Background: Large language models are typically evaluated as models, benchmarks, or short conversational episodes. Less is known about what happens...

1 months ago cs.MA cs.AI cs.HC PDF

Attack MEDIUM

EmoDistill: Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation

Yunbo Long, Haolang Zhao, Lukas Beckenbauer +2 more

Post-trained LLMs are often optimized to align responses with human preferences, making them safe, polite, and conversationally appropriate. In...

1 months ago cs.CL cs.AI PDF

Attack MEDIUM

Cordon-MAS: Defending RAG against Knowledge Poisoning via Information-Flow Control

Zhe Yu, Wenpeng Xing, Gaolei Li +4 more

Retrieval-augmented generation (RAG) increasingly underpins high-stakes applications, yet remains vulnerable to Confundo-style poisoning where...

1 months ago cs.CR cs.AI PDF

Attack HIGH

Cordyceps: Covert Control Attacks on LLMs via Data Poisoning

Zedian Shao, Charles Fleming, Teodora Baluta

Large language models (LLMs) are often fine-tuned on uncurated text datasets that adversaries can poison. Existing poisoning attacks primarily rely...

1 months ago cs.CR cs.AI cs.LG PDF

Benchmark MEDIUM

GradSentry: Gradient Spectral Entropy for Backdoor Sample Filtering in Large Language Model Fine-Tuning

Haodong Zhao, Tianyi Xu, Tianhang Zhao +2 more

Fine-tuning Large Language Models with untrusted data exposes models to backdoor attacks, where poisoned samples cause targeted misbehavior. Existing...

1 months ago cs.CR PDF

Benchmark MEDIUM

SEC-bench Pro: Can Language Models Solve Long-Horizon Software Security Tasks?

Hwiwon Lee, Jiawei Liu, Dongjun Kim +3 more

Large language models (LLMs) now support automated software security tasks, including vulnerability discovery and proof-of-concept (PoC) generation....

1 months ago cs.CR cs.LG PDF

Attack HIGH

Open-Weight LLM Fine-Tuning Defenses are Susceptible to Simple Attacks

Kevin Kuo, Chhavi Yadav, Virginia Smith

Recent defenses for safeguarding open-weight large language models (LLMs) are intended to prevent adversarial usage. Underlying these defenses is an...

1 months ago cs.LG cs.CR PDF

Defense MEDIUM

Aligning Provenance with Authorization: A Dual-Graph Defense for LLM Agents

Peiran Wang, Ying Li, Yuan Tian

LLM-based agents are increasingly deployed in high-stakes scenarios such as email management, financial transactions, and code execution, where they...

1 months ago cs.CR PDF

Benchmark MEDIUM

Constitutional Arms Races in the Public Goods Game: Co-Evolving LLM Constitutions Under Cooperation-Defection Pressure

Ujwal Kumar, Arth Singh, Hershraj Niranjani +5 more

Frontier LLM agents engage in blackmail, sabotage, and document leaks under goal conflicts in agentic settings, exposing limitations of alignment...

1 months ago cs.MA cs.GT cs.NE PDF

Attack HIGH

Erased but Exploitable: Black-box Embedding-Aware Prompting Against Unlearned Text-to-Image Diffusion Models

Arian Komaei Koma, Seyed Amir Kasaei, AmirMahdi Sadeghzadeh +1 more

Machine unlearning aims to remove specific concepts from pretrained text-to-image diffusion models, yet several white- and black-box attacks have...

1 months ago cs.CV cs.AI PDF

Defense MEDIUM

Curriculum Learning for Safety Alignment

Sandeep Kumar, Virginia Smith, Chhavi Yadav

Direct Preference Optimisation (DPO) is widely used for safety alignment in large language models. However, prior work shows it is brittle and...

1 months ago cs.LG cs.AI PDF

Attack HIGH

Intelligent Detection and Mitigation of Carpet-Bombing DDoS Attacks in SDN Using Retrieval-Augmented Generation and Large Language Models

Mohammed N. Swileh, Shengli Zhang, Kai Lei

Software-Defined Networking (SDN) provides flexible and programmable network management; however, its centralized control architecture remains highly...

1 months ago cs.CR cs.AI cs.NI PDF

Tool HIGH

AgentSecBench: Measuring Prompt Injection, Privacy Leakage, and Tool-Use Integrity in LLM Agents

Faruk Alpay, Taylan Alpay

LLM agents process trusted instructions, retrieved records, and tool observations through a common generative channel. This conflates data flow with...

1 months ago cs.CR PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial