AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 81–100 of 3,023 papers

Attack HIGH

Confidently Wrong: Severity-Aware Calibration of Prompt-Injection Detectors under Attack Shift

Md Anas Biswas

Prompt-injection detectors are deployed as guards: a model scores an input and a downstream system trusts or blocks it on that score. I study the...

6 days ago cs.CR cs.AI cs.LG PDF

Benchmark HIGH

RAVEN: Agentic RAG for Automated Vulnerability Repair

Varun Gadey, Zijie Liu, Alexandra Dmitrienko

Automated vulnerability repair has emerged as a promising direction to mitigate the growing number of software vulnerabilities. Recent advances in...

6 days ago cs.CR cs.LG cs.SE PDF

Attack HIGH

The Scissors Effect: When Resize-Based Input Diversity Helps or Hurts Transfer Attacks

Yuhang Jiang, Xiaojing Chen

Input Diversity (DI), which applies random resizing and padding at each attack iteration, is a near-default ingredient of transfer-based adversarial...

6 days ago cs.LG cs.CR cs.CV PDF

Defense MEDIUM

What Do Safety-Aligned LLMs Learn From Mixed Compliance Demonstrations?

Sihui Dai, Mann Patel

Prior work has shown that in-context demonstrations can jailbreak language models, but it remains unclear how models interpret different types of...

1 weeks ago cs.AI cs.LG PDF

Tool HIGH

Analyzing Defensive Misdirection Against Model-Guided Automated Attacks on Agentic AI Systems

Reza Soosahabi, Vivek Namsani

Agentic AI systems increasingly rely on language-model components to interpret instructions, process external data, invoke tools, and coordinate with...

1 weeks ago cs.CR cs.AI PDF

Benchmark HIGH

LLM agent safety, multi-turn red-teaming, jailbreak benchmarks, adversarial robustness, safety-critical systems

Hanwool Lee, Dasol Choi, Bokyeong Kim +2 more

Large language model (LLM) agents are increasingly proposed as supervisory components for safety-critical systems, yet their robustness under...

1 weeks ago cs.CR cs.AI PDF

Benchmark MEDIUM

Lagrange: An Open-Vocabulary, Energy-Based Sparse Framework for Generalized End-to-End Driving

Shihao Ji, HongXi Li, Zihui Song +1 more

Scaling end-to-end autonomous driving to complex, open-world environments requires perceptual models that generalize to anomalous scenarios and...

1 weeks ago cs.AI PDF

Benchmark MEDIUM

Quantization as a Malicious Task: Removing Quantization-Conditioned Backdoors via Task Arithmetic

Kaihsun Yang, Min-Yan Tsai, Chia-Mu Yu

Model quantization is widely adopted to reduce memory usage and inference cost when deploying deep neural networks on resource-constrained devices....

1 weeks ago cs.CR PDF

Benchmark HIGH

FFinRED: An Expert-Guided Benchmark Generation and Evaluation Framework for Financial LLM Red-Teaming

Chaeyun Kim, Daeyoung Park, Junghwan Kim +4 more

Existing safety benchmarks target general adversarial scenarios but miss finance-specific risks. Financial LLMs face regulatory compliance...

1 weeks ago cs.CR cs.AI PDF

Other MEDIUM

The Almost Intelligent Revolution: Options for Scaling Up Deliberation and Empowering People with AI

Serge Sharoff

The increasing prominence of Large Language Models (LLMs) in public discourse presents both opportunities and challenges for democratic deliberation....

1 weeks ago cs.CL PDF

Attack MEDIUM

Heterogeneous LLM Debate Under Adversarial Peers: Honest Gains, Replacement Costs, and Resilience

Prashanti Nilayam, Kiran Kumar Ramanna, Prashil Tumbade +1 more

Heterogeneous LLM debate is motivated by the promise that diverse peers correct one another, but the same exchange that carries correction also...

1 weeks ago cs.CR cs.MA PDF

Defense MEDIUM

Uncertainty-Aware Reward Modeling for Stable RLHF

Licheng Pan, Haocheng Yang, Haoxuan Li +7 more

Reinforcement learning from human feedback (RLHF) aligns large language models by training reward models on preference data and optimizing policies...

1 weeks ago cs.LG cs.AI PDF

Benchmark MEDIUM

SafeSpec: Fast and Safe LLM via Dynamic Reflective Sampling

Haotian Xu, Zeyang Zhang, Linbao Li +3 more

Speculative inference accelerates large language model (LLM) decoding but provides no inherent safety guarantees. Existing safety defenses are...

1 weeks ago cs.CR cs.AI PDF

Tool HIGH

A Layered Security Framework Against Prompt Injection in RAG-Based Chatbots

Gulshan Saleem, Nisar Ahmed, Muhammad Imran Zaman +1 more

Prompt injection is ranked as the most critical vulnerability in large language model (LLM) deployments by the OWASP Top 10 for LLM Applications, yet...

1 weeks ago cs.CR cs.CL PDF

Attack MEDIUM

Analyzing the Narration Gap in LLM-Solver Loops

Zunchen Huang, Songgaojun Deng

Formal tools such as SAT and SMT solvers are increasingly embedded in language model reasoning pipelines when a safety or security critical question...

1 weeks ago cs.AI cs.CR cs.LO PDF

Tool MEDIUM

FloatDoor: Platform-Triggered Backdoors in LLMs

Nils Loose, Jonas Sander, Felix Mächtle +1 more

Large language models (LLMs) are increasingly deployed in sensitive settings such as software engineering, where their outputs directly shape...

1 weeks ago cs.CR cs.LG PDF

Benchmark MEDIUM

Secure Coding Drift in LLM-Assisted Post-Quantum Cryptography Development: A Gamified Fix

R. D. N. Shakya, C. P. Wijesiriwardana, S. M. Vidanagamachchi +1 more

The transition to Post Quantum Cryptography (PQC) introduces considerable implementation complexity, requiring strict adherence to constant-time...

1 weeks ago cs.CR cs.AI cs.SE PDF

Benchmark LOW

TxBench-PP: Analyzing AI Agent Performance on Small-Molecule Preclinical Pharmacology

Hannah Le, Ramesh Ramasamy, Alex Urrutia +3 more

Artificial intelligence (AI) agents promise to accelerate drug discovery by compressing interpretation and decision-making loops, but practical...

1 weeks ago cs.AI cs.LG PDF

Benchmark LOW

TxBench-PP: Analyzing AI Agent Performance on Small-Molecule Preclinical Pharmacology

Hannah Le, Ramesh Ramasamy, Alex Urrutia +3 more

Artificial intelligence (AI) agents promise to accelerate drug discovery by compressing interpretation and decision-making loops, but practical...

1 weeks ago cs.AI cs.LG PDF

Survey MEDIUM

Runtime Compliance Verification for AI Agents

Nafiseh Kahani, Masoud Barati, Diana Addae

AI agents now handle personal data through tool use, function calls, and multi turn dialogue, which can create obligations under the General Data...

1 weeks ago cs.SE PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial