AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 21–34 of 34 papers

Clear filters

Defense MEDIUM

Categorical Robustness Assessment for Machine Learning based Network Intrusion Detection Systems

Mayank Raj, Nathaniel D. Bastian, Lance Fiondella +1 more

Network Intrusion Detection Systems (NIDS) heavily utlize Machine Learning (ML) but ML models can be manipulated via adversarial attacks. These...

2 weeks ago cs.CR cs.LG PDF

Defense MEDIUM

Online Shift Detection and Conformal Adaptation for Deployed Safety Classifiers

Jun Wen Leong

We present an online monitoring system for distributional shift in deployed safety classifiers, using calibrated sequential statistics to detect when...

2 weeks ago cs.LG cs.CR stat.ML PDF

Defense MEDIUM

Dummy Backdoor as a Defense: Removing Unknown Backdoors via Shared Internal Mechanisms for Generative LLMs

Kazuki Iwahana, Masaru Matsubayashi, Takuma Koyama +3 more

Backdoor attacks pose a serious threat to the safety and reliability of Large Language Models (LLMs), as they cause models to behave normally on...

2 weeks ago cs.CR cs.CL PDF

Defense LOW

Hiding the Trees in the Forest: Building Network Covert Channels with Hash-Based Covert Carrier Filtering

Zexiao Zou, Zhiqiang Wang, Baoxu Liu +2 more

As an effective anti-censorship mechanism, network covert channels can provide data privacy protection and ensure communication security. However,...

2 weeks ago cs.CR PDF

Defense MEDIUM

Comparative Analysis of Inference-Time Defense Methods for Multimodal Large Language Models

Bulat Nutfullin, Vladimir Evgrafov, Dmitry Namiot

Multimodal large language models (MLLMs) now appear in safety-critical applications, but the visual channel leaves them open to adversarial attacks...

2 weeks ago cs.CR PDF

Defense LOW

Gradient-Guided Reward Optimization for Inference-time Alignment

Hankun Lin, Ruqi Zhang

Ensuring the reliability of Large Language Models (LLMs) under distribution drift requires inference-time adaptation. While inference-time alignment...

2 weeks ago cs.CL cs.LG PDF

Defense LOW

Unveiling Privacy Risks in Multi-modal Large Language Models: Task-specific Vulnerabilities and Mitigation Challenges

Tiejin Chen, Pingzhi Li, Kaixiong Zhou +2 more

Privacy risks in text-only Large Language Models (LLMs) are well studied, particularly their tendency to memorize and leak sensitive information....

2 weeks ago cs.CR cs.AI PDF

Defense LOW

Autonomous Incident Resolution at Hyperscale: An Agentic AI Architecture for Network Operations

Arun Malik

Cloud network infrastructure at hyperscale presents unique operational challenges where traditional human-driven incident response cannot keep pace...

2 weeks ago cs.SE cs.AI cs.ET PDF

Defense LOW

Personalization Meets Safety:Mechanisms,Risks,and Mitigations in Personalized LLMs

Yanyan Luo, Xue Han, Ruiqiao Bai +10 more

Large Language Models (LLMs) have enabled increasingly personalized interactions by adapting to users' preferences, contexts, and long-term...

2 weeks ago cs.AI PDF

Defense MEDIUM

Sample-Efficient LLM-Based Detection of Malicious Web Server Logs with Forensically Explainable Reasoning

Bernhard Kneip, Nhien-An Le-Khac, Hong-Hanh Nguyen-Le

Forensic analysis of web server logs demands both accurate detection and human-readable explanations that can satisfy legal requirements. We present...

2 weeks ago cs.CR cs.AI PDF

Defense HIGH

RedEdit: Agentic Red-Teaming of Image Safety Classifiers via MCTS-Guided Photo-Editing

Weilin Lin, Ziqi Lin, Zhenxing Zhou +4 more

Image safety classifiers serve as a critical component of contemporary content moderation systems on the internet. However, their resilience against...

3 weeks ago cs.CR PDF

Defense MEDIUM

SpeechJBB: Probing Safety Alignment and Comprehension in Large Audio Language Models under Code-Switched Speech

Virginia Ceccatelli, Yejin Jeon, David Ifeoluwa Adelani

Large audio language models (LALMs) are increasingly deployed in real-world applications, yet their safety alignment is still primarily evaluated on...

3 weeks ago cs.SD eess.AS PDF

Defense MEDIUM

Membrane: A Self-Evolving Contrastive Safety Memory for LLM Agent Defense

Minseok Choi, Seungbin Yang, Dongjin Kim +5 more

Despite advances in safety alignment, large language models remain vulnerable to continuously evolving jailbreaks. Existing fine-tuned safety...

3 weeks ago cs.CR cs.CL PDF

Defense HIGH

AgentRedBench: Dynamic Redteaming and Integration-Aware Defense for LLM Agents over SaaS Integrations

Hiskias Dingeto, Will Leeney

Indirect prompt injection in tool-use agents is a concrete production threat: LLM agents read from integrations (third-party services such as Gmail,...

3 weeks ago cs.CR cs.AI cs.CL PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial