AI Security Research

AI Threat Alert indexes 3,082+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,082
Attack

1,196
Benchmark

883
Defense

421
Tool

321
Survey

181

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1841–1860 of 3,082 papers

Attack HIGH

AJAR: Adaptive Jailbreak Architecture for Red-teaming

Yipu Dou, Wang Yang

Large language model (LLM) safety evaluation is moving from content moderation to action security as modern systems gain persistent state, tool...

5 months ago cs.CR cs.CL PDF

Tool MEDIUM

Beyond Max Tokens: Stealthy Resource Amplification via Tool Calling Chains in LLM Agents

Kaiyu Zhou, Yongsen Zheng, Yicheng He +5 more

The agent--tool interaction loop is a critical attack surface for modern Large Language Model (LLM) agents. Existing denial-of-service (DoS) attacks...

5 months ago cs.CR cs.AI PDF

Benchmark HIGH

Hidden-in-Plain-Text: A Benchmark for Social-Web Indirect Prompt Injection in RAG

Haoze Guo, Ziqi Wei

Retrieval-augmented generation (RAG) systems put more and more emphasis on grounding their responses in user-generated content found on the Web,...

5 months ago cs.CR cs.HC PDF

Attack HIGH

Serverless AI Security: Attack Surface Analysis and Runtime Protection Mechanisms for FaaS-Based Machine Learning

Chetan Pathade, Vinod Dhimam, Sheheryar Ahmad +1 more

Serverless computing has achieved widespread adoption, with over 70% of AWS organizations using serverless solutions [1]. Meanwhile, machine learning...

5 months ago cs.CR cs.AI PDF

Defense HIGH

Multi-Agent Taint Specification Extraction for Vulnerability Detection

Jonah Ghebremichael, Saastha Vasan, Saad Ullah +6 more

Static Application Security Testing (SAST) tools using taint analysis are widely viewed as providing higher-quality vulnerability detection results...

5 months ago cs.CR cs.SE PDF

Tool MEDIUM

SecMLOps: A Comprehensive Framework for Integrating Security Throughout the MLOps Lifecycle

Xinrui Zhang, Pincan Zhao, Jason Jaskolka +2 more

Machine Learning (ML) has emerged as a pivotal technology in the operation of large and complex systems, driving advancements in fields such as...

5 months ago cs.CR cs.SE PDF

Tool LOW

Institutional AI: A Governance Framework for Distributional AGI Safety

Federico Pierucci, Marcello Galisai, Marcantonio Syrnikov Bracale +6 more

As LLM-based systems increasingly operate as agents embedded within human social and technical systems, alignment can no longer be treated as a...

5 months ago cs.CY PDF

Defense HIGH

Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay

Hao Wang, Yanting Wang, Hao Li +2 more

Large Language Models (LLMs) have achieved remarkable capabilities but remain vulnerable to adversarial ``jailbreak'' attacks designed to bypass...

5 months ago cs.CR cs.CL PDF

Attack HIGH

Defending Large Language Models Against Jailbreak Attacks via In-Decoding Safety-Awareness Probing

Yinzhi Zhao, Ming Wang, Shi Feng +3 more

Large language models (LLMs) have achieved impressive performance across natural language tasks and are increasingly deployed in real-world...

5 months ago cs.AI cs.CL PDF

Defense LOW

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

Xingjun Ma, Yixu Wang, Hengyuan Xu +18 more

The rapid evolution of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has driven major gains in reasoning, perception, and...

5 months ago cs.AI cs.CL cs.CV PDF

Attack MEDIUM

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

Christina Lu, Jack Gallagher, Jonathan Michala +2 more

Large language models can represent a variety of personas but typically default to a helpful Assistant identity cultivated during post-training. We...

5 months ago cs.CL PDF

Survey MEDIUM

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

Yi Liu, Weizhe Wang, Ruitao Feng +5 more

The rise of AI agent frameworks has introduced agent skills, modular packages containing instructions and executable code that dynamically extend...

5 months ago cs.CR cs.AI cs.CL PDF

Attack MEDIUM

The Straight and Narrow: Do LLMs Possess an Internal Moral Path?

Luoming Hu, Jingjie Zeng, Liang Yang +1 more

Enhancing the moral alignment of Large Language Models (LLMs) is a critical challenge in AI safety. Current alignment techniques often act as...

5 months ago cs.CL PDF

Attack HIGH

Reasoning Hijacking: Subverting LLM Classification via Decision-Criteria Injection

Yuansen Liu, Yixuan Tang, Anthony Kum Hoe Tun

Current LLM safety research predominantly focuses on mitigating Goal Hijacking, preventing attackers from redirecting a model's high-level objective...

5 months ago cs.CR PDF

Attack LOW

Fundamental Limitations of Favorable Privacy-Utility Guarantees for DP-SGD

Murat Bilgehan Ertan, Marten van Dijk

Differentially Private Stochastic Gradient Descent (DP-SGD) is the dominant paradigm for private training, but its fundamental limitations under...

5 months ago cs.LG cs.CR PDF

Attack HIGH

ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack

Hao Li, Yankai Yang, G. Edward Suh +2 more

Large Language Models (LLMs) have enabled the development of powerful agentic systems capable of automating complex workflows across various fields....

5 months ago cs.CR cs.AI cs.CL PDF

Tool MEDIUM

ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback

Yutao Mou, Zhangchi Xue, Lijun Li +4 more

While LLM-based agents can interact with environments via invoking external tools, their expanded capabilities also amplify security risks....

5 months ago cs.CL PDF

Defense MEDIUM

Understanding and Preserving Safety in Fine-Tuned LLMs

Jiawen Zhang, Yangfan Hu, Kejia Chen +7 more

Fine-tuning is an essential and pervasive functionality for applying large language models (LLMs) to downstream tasks. However, it has the potential...

5 months ago cs.LG cs.AI PDF

Survey MEDIUM

SoK: Privacy-aware LLM in Healthcare: Threat Model, Privacy Techniques, Challenges and Recommendations

Mohoshin Ara Tahera, Karamveer Singh Sidhu, Shuvalaxmi Dass +1 more

Large Language Models (LLMs) are increasingly adopted in healthcare to support clinical decision-making, summarize electronic health records (EHRs),...

5 months ago cs.CR cs.LG PDF

Tool MEDIUM

CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents

Hanna Foerster, Tom Blanchard, Kristina Nikolić +6 more

AI agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior to steal credentials or cause financial loss....

5 months ago cs.AI PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,082+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial