AI Security Research

AI Threat Alert indexes 3,037+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,037
Attack

1,183
Benchmark

868
Defense

410
Tool

319
Survey

177

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 561–580 of 954 papers

Clear filters

Defense MEDIUM

ADVERSA: Measuring Multi-Turn Guardrail Degradation and Judge Reliability in Large Language Models

Harry Owiredu-Ashley

Most adversarial evaluations of large language model (LLM) safety assess single prompts and report binary pass/fail outcomes, which fails to capture...

3 months ago cs.CR cs.AI cs.CL PDF

Tool MEDIUM

FlexServe: A Fast and Secure LLM Serving System for Mobile Devices with Flexible Resource Isolation

Yinpeng Wu, Yitong Chen, Lixiang Wang +3 more

Device-side Large Language Models (LLMs) have witnessed explosive growth, offering higher privacy and availability compared to cloud-side LLMs....

3 months ago cs.CR cs.LG cs.OS PDF

Attack MEDIUM

LLM-Agent Interactions on Markets with Information Asymmetries

Alexander Erlei, Lukas Meub

As AI agents increasingly act on behalf of human stakeholders in economic settings, understanding their behavior in complex market environments...

3 months ago econ.GN PDF

Defense MEDIUM

DistillGuard: Evaluating Defenses Against LLM Knowledge Distillation

Bo Jiang

Knowledge distillation from proprietary LLM APIs poses a growing threat to model providers, yet defenses against this attack remain fragmented and...

3 months ago cs.CR cs.AI cs.CL PDF

Defense MEDIUM

VoiceSHIELD-Small: Real-Time Malicious Speech Detection and Transcription

Sumit Ranjan, Sugandha Sharma, Ubaid Abbas +1 more

Voice interfaces are quickly becoming a common way for people to interact with AI systems. This also brings new security risks, such as prompt...

3 months ago cs.SD cs.AI PDF

Benchmark MEDIUM

Models as Lego Builders: Assembling Malice from Benign Blocks via Semantic Blueprints

Chenxi Li, Xianggan Liu, Dake Shen +9 more

Despite the rapid progress of Large Vision-Language Models (LVLMs), the integration of visual modalities introduces new safety vulnerabilities that...

3 months ago cs.CV cs.LG PDF

Tool MEDIUM

Give Them an Inch and They Will Take a Mile:Understanding and Measuring Caller Identity Confusion in MCP-Based AI Systems

Yuhang Huang, Boyang Ma, Biwei Yan +5 more

The Model Context Protocol (MCP) is an open and standardized interface that enables large language models (LLMs) to interact with external tools and...

3 months ago cs.CR cs.AI PDF

Tool MEDIUM

Where Do LLM-based Systems Break? A System-Level Security Framework for Risk Assessment and Treatment

Neha Nagaraja, Hayretdin Bahsi

Large Language Models (LLMs) are increasingly integrated into safety-critical workflows, yet existing security analyses remain fragmented and often...

3 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Backdoor4Good: Benchmarking Beneficial Uses of Backdoors in LLMs

Yige Li, Wei Zhao, Zhe Li +6 more

Backdoor mechanisms have traditionally been studied as security threats that compromise the integrity of machine learning models. However, the same...

3 months ago cs.CR cs.AI PDF

Attack MEDIUM

Detecting Cryptographically Relevant Software Packages with Collaborative LLMs

Eduard Hirsch, Kristina Raab, Tobias J. Bauer +1 more

IT systems are facing an increasing number of security threats, including advanced persistent attacks and future quantum-computing vulnerabilities....

3 months ago cs.CR cs.IR PDF

Benchmark MEDIUM

Governance Architecture for Autonomous Agent Systems: Threats, Framework, and Engineering Practice

Yuxu Ge

Autonomous agents powered by large language models introduce a class of execution-layer vulnerabilities -- prompt injection, retrieval poisoning, and...

3 months ago cs.CR cs.AI PDF

Tool MEDIUM

Can Safety Emerge from Weak Supervision? A Systematic Analysis of Small Language Models

Punyajoy Saha, Sudipta Halder, Debjyoti Mondal +1 more

Safety alignment is critical for deploying large language models (LLMs) in real-world applications, yet most existing approaches rely on large...

3 months ago cs.CL cs.AI cs.LG PDF

Survey MEDIUM

ESAA-Security: An Event-Sourced, Verifiable Architecture for Agent-Assisted Security Audits of AI-Generated Code

Elzo Brito dos Santos Filho

AI-assisted software generation has increased development speed, but it has also amplified a persistent engineering problem: systems that are...

3 months ago cs.CR cs.AI PDF

Attack MEDIUM

SPOILER: TEE-Shielded DNN Partitioning of On-Device Secure Inference with Poison Learning

Donghwa Kang, Hojun Choe, Doohyun Kim +2 more

Deploying deep neural networks (DNNs) on edge devices exposes valuable intellectual property to model-stealing attacks. While TEE-shielded DNN...

3 months ago cs.CR PDF

Defense MEDIUM

Proof-of-Guardrail in AI Agents and What (Not) to Trust from It

Xisen Jin, Michael Duan, Qin Lin +4 more

As AI agents become widely deployed as online services, users often rely on an agent developer's claim about how safety is enforced, which introduces...

3 months ago cs.CR cs.AI cs.CL PDF

Defense MEDIUM

Knowing without Acting: The Disentangled Geometry of Safety Mechanisms in Large Language Models

Jinman Wu, Yi Xie, Shen Lin +2 more

Safety alignment is often conceptualized as a monolithic process wherein harmfulness detection automatically triggers refusal. However, the...

3 months ago cs.CR cs.AI cs.LG PDF

Defense MEDIUM

Revisiting the (Sub)Optimality of Best-of-N for Inference-Time Alignment

Ved Sriraman, Adam Block

Best-of-N (BoN) sampling is a widely used inference-time alignment method for language models, whereby N candidate responses are sampled from a...

3 months ago cs.LG cs.AI PDF

Benchmark MEDIUM

Robust Single-message Shuffle Differential Privacy Protocol for Accurate Distribution Estimation

Xiaoguang Li, Hanyi Wang, Yaowei Huang +6 more

Shuffler-based differential privacy (shuffle-DP) is a privacy paradigm providing high utility by involving a shuffler to permute noisy report from...

3 months ago cs.CR PDF

Attack MEDIUM

Good-Enough LLM Obfuscation (GELO)

Anatoly Belikov, Ilya Fedotov

Large Language Models (LLMs) are increasingly served on shared accelerators where an adversary with read access to device memory can observe KV...

3 months ago cs.CR cs.LG PDF

Defense MEDIUM

ThaiSafetyBench: Assessing Language Model Safety in Thai Cultural Contexts

Trapoom Ukarapol, Nut Chukamphaeng, Kunat Pipatanakul +1 more

The safety evaluation of large language models (LLMs) remains largely centered on English, leaving non-English languages and culturally grounded...

3 months ago cs.CL PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,037+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial