AI Security Research

AI Threat Alert indexes 3,037+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,037
Attack

1,183
Benchmark

868
Defense

410
Tool

319
Survey

177

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 861–880 of 946 papers

Clear filters

Benchmark MEDIUM

Efficient Privacy-Preserving Retrieval Augmented Generation with Distance-Preserving Encryption

Huanyi Ye, Jiale Guo, Ziyao Liu +1 more

RAG has emerged as a key technique for enhancing response quality of LLMs without high computational cost. In traditional architectures, RAG services...

5 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Multimodal Generative Engine Optimization: Rank Manipulation for Vision-Language Model Rankers

Yixuan Du, Chenxiao Yu, Haoyan Xu +3 more

Vision-Language Models (VLMs) are rapidly replacing unimodal encoders in modern retrieval and recommendation systems. While their capabilities are...

5 months ago cs.CL cs.AI cs.LG PDF

Benchmark MEDIUM

Less Is More -- Until It Breaks: Security Pitfalls of Vision Token Compression in Large Vision-Language Models

Xiaomei Zhang, Zhaoxi Zhang, Leo Yu Zhang +3 more

Visual token compression is widely adopted to improve the inference efficiency of Large Vision-Language Models (LVLMs), enabling their deployment in...

5 months ago cs.CR cs.AI PDF

Tool MEDIUM

Taming Various Privilege Escalation in LLM-Based Agent Systems: A Mandatory Access Control Framework

Zimo Ji, Daoyuan Wu, Wenyuan Jiang +5 more

Large Language Model (LLM)-based agent systems are increasingly deployed for complex real-world tasks but remain vulnerable to natural language-based...

5 months ago cs.CR PDF

Benchmark MEDIUM

Gradient Structure Estimation under Label-Only Oracles via Spectral Sensitivity

Jun Liu, Leo Yu Zhang, Fengpeng Li +2 more

Hard-label black-box settings, where only top-1 predicted labels are observable, pose a fundamentally constrained yet practically important feedback...

5 months ago cs.LG cs.CR PDF

Benchmark MEDIUM

Gradient Structure Estimation under Label-Only Oracles via Spectral Sensitivity

Jun Liu, Leo Yu Zhang, Fengpeng Li +2 more

Hard-label black-box settings, where only top-1 predicted labels are observable, pose a fundamentally constrained yet practically important feedback...

5 months ago cs.LG cs.CR PDF

Attack MEDIUM

Building Production-Ready Probes For Gemini

János Kramár, Joshua Engels, Zheng Wang +4 more

Frontier language model capabilities are improving rapidly. We thus need stronger mitigations against bad actors misusing increasingly powerful...

5 months ago cs.LG cs.AI cs.CL PDF

Attack MEDIUM

LoRA as Oracle

Marco Arazzi, Antonino Nocera

Backdoored and privacy-leaking deep neural networks pose a serious threat to the deployment of machine learning systems in security-critical...

5 months ago cs.CR cs.AI PDF

Tool MEDIUM

Beyond Max Tokens: Stealthy Resource Amplification via Tool Calling Chains in LLM Agents

Kaiyu Zhou, Yongsen Zheng, Yicheng He +5 more

The agent--tool interaction loop is a critical attack surface for modern Large Language Model (LLM) agents. Existing denial-of-service (DoS) attacks...

5 months ago cs.CR cs.AI PDF

Tool MEDIUM

SecMLOps: A Comprehensive Framework for Integrating Security Throughout the MLOps Lifecycle

Xinrui Zhang, Pincan Zhao, Jason Jaskolka +2 more

Machine Learning (ML) has emerged as a pivotal technology in the operation of large and complex systems, driving advancements in fields such as...

5 months ago cs.CR cs.SE PDF

Attack MEDIUM

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

Christina Lu, Jack Gallagher, Jonathan Michala +2 more

Large language models can represent a variety of personas but typically default to a helpful Assistant identity cultivated during post-training. We...

5 months ago cs.CL PDF

Survey MEDIUM

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

Yi Liu, Weizhe Wang, Ruitao Feng +5 more

The rise of AI agent frameworks has introduced agent skills, modular packages containing instructions and executable code that dynamically extend...

5 months ago cs.CR cs.AI cs.CL PDF

Attack MEDIUM

The Straight and Narrow: Do LLMs Possess an Internal Moral Path?

Luoming Hu, Jingjie Zeng, Liang Yang +1 more

Enhancing the moral alignment of Large Language Models (LLMs) is a critical challenge in AI safety. Current alignment techniques often act as...

5 months ago cs.CL PDF

Tool MEDIUM

ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback

Yutao Mou, Zhangchi Xue, Lijun Li +4 more

While LLM-based agents can interact with environments via invoking external tools, their expanded capabilities also amplify security risks....

5 months ago cs.CL PDF

Defense MEDIUM

Understanding and Preserving Safety in Fine-Tuned LLMs

Jiawen Zhang, Yangfan Hu, Kejia Chen +7 more

Fine-tuning is an essential and pervasive functionality for applying large language models (LLMs) to downstream tasks. However, it has the potential...

5 months ago cs.LG cs.AI PDF

Survey MEDIUM

SoK: Privacy-aware LLM in Healthcare: Threat Model, Privacy Techniques, Challenges and Recommendations

Mohoshin Ara Tahera, Karamveer Singh Sidhu, Shuvalaxmi Dass +1 more

Large Language Models (LLMs) are increasingly adopted in healthcare to support clinical decision-making, summarize electronic health records (EHRs),...

5 months ago cs.CR cs.LG PDF

Tool MEDIUM

CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents

Hanna Foerster, Tom Blanchard, Kristina Nikolić +6 more

AI agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior to steal credentials or cause financial loss....

5 months ago cs.AI PDF

Benchmark MEDIUM

Blue Teaming Function-Calling Agents

Greta Dolcetti, Giulio Zizzo, Sergio Maffeis

We present an experimental evaluation that assesses the robustness of four open source LLMs claiming function-calling capabilities against three...

5 months ago cs.CR cs.AI PDF

Attack MEDIUM

UserLM-R1: Modeling Human Reasoning in User Language Models with Multi-Reward Reinforcement Learning

Feng Zhang, Shijia Li, Chunmao Zhang +7 more

User simulators serve as the critical interactive environment for agent post-training, and an ideal user simulator generalizes across domains and...

5 months ago cs.CL PDF

Defense MEDIUM

Beyond Simulations: What 20,000 Real Conversations Reveal About Mental Health AI Safety

Caitlin A. Stamatis, Jonah Meyerhoff, Richard Zhang +3 more

Large language models (LLMs) are increasingly used for mental health support, yet existing safety evaluations rely primarily on small,...

5 months ago cs.CY cs.CL PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,037+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial