AI Security Research

AI Threat Alert indexes 3,037+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,037
Attack

1,183
Benchmark

868
Defense

410
Tool

319
Survey

177

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 741–760 of 951 papers

Clear filters

Benchmark MEDIUM

Do Vision-Language Models Respect Contextual Integrity in Location Disclosure?

Ruixin Yang, Ethan Mendes, Arthur Wang +4 more

Vision-language models (VLMs) have demonstrated strong performance in image geolocation, a capability further sharpened by frontier multimodal large...

4 months ago cs.CR cs.AI PDF

Attack MEDIUM

Comparative Insights on Adversarial Machine Learning from Industry and Academia: A User-Study Approach

Vishruti Kakkad, Paul Chung, Hanan Hibshi +1 more

An exponential growth of Machine Learning and its Generative AI applications brings with it significant security challenges, often referred to as...

4 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Alignment Drift in Multimodal LLMs: A Two-Phase, Longitudinal Evaluation of Harm Across Eight Model Releases

Casey Ford, Madison Van Doren, Emily Dix

Multimodal large language models (MLLMs) are increasingly deployed in real-world systems, yet their safety under adversarial prompting remains...

4 months ago cs.CL cs.AI cs.HC PDF

Attack MEDIUM

LiteToken: Removing Intermediate Merge Residues From BPE Tokenizers

Yike Sun, Haotong Yang, Zhouchen Lin +1 more

Tokenization is fundamental to how language models represent and process text, yet the behavior of widely used BPE tokenizers has received far less...

4 months ago cs.CL PDF

Attack MEDIUM

Inference-Time Backdoors via Hidden Instructions in LLM Chat Templates

Ariel Fogel, Omer Hofman, Eilon Cohen +1 more

Open-weight language models are increasingly used in production settings, raising new security challenges. One prominent threat in this context is...

4 months ago cs.CR cs.LG PDF

Attack MEDIUM

A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness

Leo Schwinn, Moritz Ladenburger, Tim Beyer +3 more

Automated \enquote{LLM-as-a-Judge} frameworks have become the de facto standard for scalable evaluation across natural language processing. For...

4 months ago cs.CL cs.AI PDF

Benchmark MEDIUM

Trust The Typical

Debargha Ganguly, Sreehari Sankar, Biyao Zhang +8 more

Current approaches to LLM safety fundamentally rely on a brittle cat-and-mouse game of identifying and blocking known threats via guardrails. We...

4 months ago cs.CL cs.AI cs.DC PDF

Defense MEDIUM

RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models

Jiacheng Liang, Yuhui Wang, Tanqiu Jiang +1 more

Mixture-of-Experts (MoE) language models introduce unique challenges for safety alignment due to their sparse routing mechanisms, which can enable...

4 months ago cs.LG cs.AI cs.CR PDF

Tool MEDIUM

PriMod4AI: Lifecycle-Aware Privacy Threat Modeling for AI Systems using LLM

Gautam Savaliya, Robert Aufschläger, Abhishek Subedi +2 more

Artificial intelligence systems introduce complex privacy risks throughout their lifecycle, especially when processing sensitive or high-dimensional...

4 months ago cs.CR cs.AI PDF

Attack MEDIUM

Embracing Anisotropy: Turning Massive Activations into Interpretable Control Knobs for Large Language Models

Youngji Roh, Hyunjin Cho, Jaehyung Kim

Large Language Models (LLMs) exhibit highly anisotropic internal representations, often characterized by massive activations, a phenomenon where a...

4 months ago cs.CL PDF

Attack MEDIUM

RAPO: Risk-Aware Preference Optimization for Generalizable Safe Reasoning

Zeming Wei, Qiaosheng Zhang, Xia Hu +1 more

Large Reasoning Models (LRMs) have achieved tremendous success with their chain-of-thought (CoT) reasoning, yet also face safety issues similar to...

4 months ago cs.LG cs.AI cs.CL PDF

Defense MEDIUM

Semantic Consensus Decoding: Backdoor Defense for Verilog Code Generation

Guang Yang, Xing Hu, Xiang Chen +1 more

Large language models (LLMs) for Verilog code generation are increasingly adopted in hardware design, yet remain vulnerable to backdoor attacks where...

4 months ago cs.SE cs.CR PDF

Attack MEDIUM

Phantom Transfer: Data-level Defences are Insufficient Against Data Poisoning

Andrew Draganov, Tolga H. Dur, Anandmayi Bhongade +1 more

We present a data poisoning attack -- Phantom Transfer -- with the property that, even if you know precisely how the poison was placed into an...

4 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

SalamahBench: Toward Standardized Safety Evaluation for Arabic Language Models

Omar Abdelnasser, Fatemah Alharbi, Khaled Khasawneh +2 more

Safety alignment in Language Models (LMs) is fundamental for trustworthy AI. However, while different stakeholders are trying to leverage Arabic...

4 months ago cs.CL cs.AI PDF

Attack MEDIUM

Is It Possible to Make Chatbots Virtuous? Investigating a Virtue-Based Design Methodology Applied to LLMs

Matthew P. Lad, Louisa Conwill, Megan Levis Scheirer

With the rapid growth of Large Language Models (LLMs), criticism of their societal impact has also grown. Work in Responsible AI (RAI) has focused on...

4 months ago cs.HC PDF

Defense MEDIUM

Refining Decision Boundaries In Anomaly Detection Using Similarity Search Within the Feature Space

Sidahmed Benabderrahmane, Petko Valtchev, James Cheney +1 more

Detecting rare and diverse anomalies in highly imbalanced datasets-such as Advanced Persistent Threats (APTs) in cybersecurity-remains a fundamental...

4 months ago cs.LG cs.AI cs.CR PDF

Benchmark MEDIUM

Extracting Recurring Vulnerabilities from Black-Box LLM-Generated Software

Tomer Kordonsky, Maayan Yamin, Noam Benzimra +2 more

LLMs are increasingly used for code generation, but their outputs often follow recurring templates that can induce predictable vulnerabilities. We...

4 months ago cs.CR cs.AI PDF

Defense MEDIUM

Semantic Containment as a Fundamental Property of Emergent Misalignment

Rohan Saxena

Fine-tuning language models on narrowly harmful data causes emergent misalignment (EM) -- behavioral failures extending far beyond training...

4 months ago cs.CL cs.AI PDF

Attack MEDIUM

Monotonicity as an Architectural Bias for Robust Language Models

Patrick Cooper, Alireza Nadali, Ashutosh Trivedi +1 more

Large language models (LLMs) are known to exhibit brittle behavior under adversarial prompts and jailbreak attacks, even after extensive alignment...

4 months ago cs.CL cs.AI cs.CR PDF

Benchmark MEDIUM

Benchmarking Large Language Models for Zero-shot and Few-shot Phishing URL Detection

Najmul Hasan, Prashanth BusiReddyGari

The Uniform Resource Locator (URL), introduced in a connectivity-first era to define access and locate resources, remains historically limited,...

4 months ago cs.CR cs.AI PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,037+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial