AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 761–780 of 866 papers

Clear filters

Benchmark LOW

When Intelligence Fails: An Empirical Study on Why LLMs Struggle with Password Cracking

Mohammad Abdul Rehman, Syed Imad Ali Shah, Abbas Anwar +2 more

The remarkable capabilities of Large Language Models (LLMs) in natural language understanding and generation have sparked interest in their potential...

8 months ago cs.CR cs.AI cs.LG PDF

Benchmark LOW

DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios

Yao Huang, Yitong Sun, Yichi Zhang +3 more

Despite the remarkable advances of Large Language Models (LLMs) across diverse cognitive tasks, the rapid enhancement of these capabilities also...

8 months ago cs.CL cs.AI cs.LG PDF

Benchmark LOW

VERA-MH Concept Paper

Luca Belli, Kate Bentley, Will Alexander +5 more

We introduce VERA-MH (Validation of Ethical and Responsible AI in Mental Health), an automated evaluation of the safety of AI chatbots used in mental...

8 months ago cs.CY cs.AI cs.HC PDF

Benchmark HIGH

LLM Agents for Automated Web Vulnerability Reproduction: Are We There Yet?

Bin Liu, Yanjie Zhao, Guoai Xu +1 more

Large language model (LLM) agents have demonstrated remarkable capabilities in software engineering and cybersecurity tasks, including code...

8 months ago cs.SE cs.CR PDF

Benchmark HIGH

Echoes of Human Malice in Agents: Benchmarking LLMs for Multi-Turn Online Harassment Attacks

Trilok Padhi, Pinxian Lu, Abdulkadir Erol +5 more

Large Language Model (LLM) agents are powering a growing share of interactive web applications, yet remain vulnerable to misuse and harm. Prior...

8 months ago cs.AI PDF

Benchmark LOW

Toward Cybersecurity-Expert Small Language Models

Matan Levi, Daniel Ohayon, Ariel Blobstein +3 more

Large language models (LLMs) are transforming everyday applications, yet deployment in cybersecurity lags due to a lack of high-quality,...

8 months ago cs.CL cs.AI cs.CR PDF

Benchmark MEDIUM

One Bug, Hundreds Behind: LLMs for Large-Scale Bug Discovery

Qiushi Wu, Yue Xiao, Dhilung Kirat +3 more

Fixing bugs in large programs is a challenging task that demands substantial time and effort. Once a bug is found, it is reported to the project...

8 months ago cs.SE cs.AI PDF

Benchmark MEDIUM

When "Correct" Is Not Safe: Can We Trust Functionally Correct Patches Generated by Code Agents?

Yibo Peng, James Song, Lei Li +6 more

Code agents are increasingly trusted to autonomously fix bugs on platforms such as GitHub, yet their security evaluation focuses almost exclusively...

8 months ago cs.CR cs.SE PDF

Benchmark LOW

GAPS: A Clinically Grounded, Automated Benchmark for Evaluating AI Clinicians

Xiuyuan Chen, Tao Sun, Dexin Su +38 more

Current benchmarks for AI clinician systems, often based on multiple-choice exams or manual rubrics, fail to capture the depth, robustness, and...

8 months ago cs.CL PDF

Benchmark MEDIUM

Risk-adaptive Activation Steering for Safe Multimodal Large Language Models

Jonghyun Park, Minhyuk Seo, Jonghyun Choi

One of the key challenges of modern AI models is ensuring that they provide helpful responses to benign queries while refusing malicious ones. But...

8 months ago cs.CV PDF

Benchmark HIGH

Selective Adversarial Attacks on LLM Benchmarks

Ivan Dubrovsky, Anastasia Orlova, Illarion Iov +3 more

Benchmarking outcomes increasingly govern trust, selection, and deployment of LLMs, yet these evaluations remain vulnerable to semantically...

8 months ago cs.LG PDF

Benchmark MEDIUM

Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers

Xin Zhao, Xiaojun Chen, Bingshan Liu +3 more

Large language models (LLMs) with Mixture-of-Experts (MoE) architectures achieve impressive performance and efficiency by dynamically routing inputs...

8 months ago cs.CR PDF

Benchmark LOW

Taming the Fragility of KV Cache Eviction in LLM Inference

Yuan Feng, Haoyu Guo, JunLin Lv +2 more

Large language models have revolutionized natural language processing, yet their deployment remains hampered by the substantial memory and runtime...

8 months ago cs.CL PDF

Benchmark MEDIUM

SHIELD: Classifier-Guided Prompting for Robust and Safer LVLMs

Juan Ren, Mark Dras, Usman Naseem

Large Vision-Language Models (LVLMs) unlock powerful multimodal reasoning but also expand the attack surface, particularly through adversarial inputs...

8 months ago cs.CL PDF

Benchmark LOW

TRUSTVIS: A Multi-Dimensional Trustworthiness Evaluation Framework for Large Language Models

Ruoyu Sun, Da Song, Jiayang Song +2 more

As Large Language Models (LLMs) continue to revolutionize Natural Language Processing (NLP) applications, critical concerns about their...

8 months ago cs.SE cs.AI cs.CL PDF

Benchmark MEDIUM

A Multilingual, Large-Scale Study of the Interplay between LLM Safeguards, Personalisation, and Disinformation

João A. Leite, Arnav Arora, Silvia Gargova +5 more

Large Language Models (LLMs) can generate human-like disinformation, yet their ability to personalise such content across languages and demographics...

8 months ago cs.CL PDF

Benchmark MEDIUM

Shallow Robustness, Deep Vulnerabilities: Multi-Turn Evaluation of Medical LLMs

Blazej Manczak, Eric Lin, Francisco Eiras +2 more

Large language models (LLMs) are rapidly transitioning into medical clinical use, yet their reliability under realistic, multi-turn interactions...

8 months ago cs.CL cs.AI PDF

Benchmark HIGH

MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents

Dongsen Zhang, Zekun Li, Xu Luo +3 more

The Model Context Protocol (MCP) standardizes how large language model (LLM) agents discover, describe, and call external tools. While MCP unlocks...

8 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Locket: Robust Feature-Locking Technique for Language Models

Lipeng He, Vasisht Duddu, N. Asokan

Chatbot providers (e.g., OpenAI) rely on tiered subscription schemes to generate revenue, offering basic models for free users, and advanced models...

8 months ago cs.CR cs.LG PDF

Benchmark MEDIUM

Deep Research Brings Deeper Harm

Shuo Chen, Zonggen Li, Zhen Han +7 more

Deep Research (DR) agents built on Large Language Models (LLMs) can perform complex, multi-step research by decomposing tasks, retrieving online...

8 months ago cs.CR cs.CL PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial