AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 221–240 of 1,175 papers

Clear filters

Attack HIGH

Conceal, Reconstruct, Jailbreak: Exploiting the Reconstruction-Concealment Tradeoff in MLLMs

Md Farhamdur Reza, Richeng Jin, Tianfu Wu +1 more

Intent-obfuscation-based jailbreak attacks on multimodal large language models (MLLMs) transform a harmful query into a concealed multimodal input to...

1 months ago cs.AI PDF

Attack HIGH

PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI

Wesley Hanwen Deng, Mingxi Yan, Sunnie S. Y. Kim +5 more

Recent developments in AI safety research have called for red-teaming methods that effectively surface potential risks posed by generative AI models,...

1 months ago cs.HC cs.AI cs.CY PDF

Attack MEDIUM

Architecture Matters: Comparing RAG Systems under Knowledge Base Poisoning

Samuel Korn

Retrieval-Augmented Generation (RAG) systems are vulnerable to knowledge base poisoning, yet existing attacks have been evaluated almost exclusively...

1 months ago cs.CR cs.CL cs.LG PDF

Attack MEDIUM

Information Theoretic Adversarial Training of Large Language Models

Yiwei Zhang, Jeremiah Birrell, Reza Ebrahimi +3 more

Large language models (LLMs) remain vulnerable to adversarial prompting despite advances in alignment and safety, often exhibiting harmful behaviors...

1 months ago cs.LG cs.AI cs.CR PDF

Attack MEDIUM

On the Hardness of Junking LLMs

Marco Rando, Samuel Vaiter

Large language models (LLMs) are known to be vulnerable to jailbreak attacks, which typically rely on carefully designed prompts containing explicit...

1 months ago cs.LG PDF

Attack HIGH

Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization

Zheng Fang, Xiaosen Wang, Shenyi Zhang +2 more

Jailbreak attacks on audio language models (ALMs) optimize audio perturbations to elicit unsafe generations, and they typically update the entire...

1 months ago cs.CR cs.AI cs.CL PDF

Attack MEDIUM

Gray-Box Poisoning of Continuous Malware Ingestion Pipelines

Jan Dolejš, Martin Jureček, Róbert Lórencz

Modern malware detection pipelines rely on continuous data ingestion and machine learning to counter the high volume of novel threats. This work...

1 months ago cs.CR cs.LG PDF

Attack HIGH

Misrouter: Exploiting Routing Mechanisms for Input-Only Attacks on Mixture-of-Experts LLMs

Zekun Fei, Zihao Wang, Weijie Liu +4 more

Mixture-of-Experts (MoE) architectures have emerged as a leading paradigm for scaling large language models through sparse, routing-based...

1 months ago cs.CR PDF

Attack MEDIUM

Laundering AI Authority with Adversarial Examples

Jie Zhang, Pura Peetathawatchai, Florian Tramèr +1 more

Vision-language models (VLMs) are increasingly deployed as trusted authorities -- fact-checking images on social media, comparing products, and...

1 months ago cs.CR cs.LG PDF

Attack MEDIUM

Undetectable Backdoors in Model Parameters: Hiding Sparse Secrets in High Dimensions

Sarthak Choudhary, Atharv Singh Patlan, Nils Palumbo +3 more

We present Sparse Backdoor, a supply-chain attack that plants a \emph{provably undetectable} backdoor in pre-trained image classifiers, including...

1 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

Redefining AI Red Teaming in the Agentic Era: From Weeks to Hours

Raja Sekhar Rao Dheekonda, Will Pearce, Nick Landers

AI systems are entering critical domains like healthcare, finance, and defense, yet remain vulnerable to adversarial attacks. While AI red teaming is...

1 months ago cs.AI cs.CR PDF

Attack HIGH

Generating Proof-of-Vulnerability Tests to Help Enhance the Security of Complex Software

Shravya Kanchi, Xiaoyan Zang, Ying Zhang +2 more

Developers create modern software applications (Apps) on top of third-party libraries (Libs). When library vulnerabilities are reachable through...

1 months ago cs.CR cs.SE PDF

Attack MEDIUM

The Infinite Mutation Engine? Measuring Polymorphism in LLM-Generated Offensive Code

Gabriel Hortea, Juan Tapiador

Malware authors have traditionally relied on polymorphic techniques to produce variants in the same malware family, complicating signature-based...

1 months ago cs.CR PDF

Attack MEDIUM

The Infinite Mutation Engine? Measuring Polymorphism in LLM-Generated Offensive Code

Gabriel Hortea, Juan Tapiador

Malware authors have traditionally relied on polymorphic techniques to produce variants in the same malware family, complicating signature-based...

1 months ago cs.CR PDF

Attack HIGH

Membership Inference Attacks for Retrieval Based In-Context Learning for Document Question Answering

Tejas Kulkarni, Antti Koskela, Laith Zumot

We show that remotely hosted applications employing in-context learning when augmented with a retrieval function to select in-context examples can be...

1 months ago cs.CR cs.LG PDF

Attack MEDIUM

MEMSAD: Gradient-Coupled Anomaly Detection for Memory Poisoning in Retrieval-Augmented Agents

Ishrith Gowda

Persistent external memory enables LLM agents to maintain context across sessions, yet its security properties remain formally uncharacterized. We...

1 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

Shihao Weng, Yang Feng, Jinrui Zhang +3 more

The rise of Large Language Model (LLM) agents, augmented with tool use, skills, and external knowledge, has introduced new security risks. Among...

1 months ago cs.CR cs.SE PDF

Attack MEDIUM

Dependency-Aware Privacy for Multi-turn Agents

Divyam Anshumaan, Sarthak Choudhary, Nils Palumbo +1 more

LLM agents release private data across multi-service interactions. Existing prompt sanitizers based on metric differential privacy treat each release...

1 months ago cs.CR PDF

Attack MEDIUM

PIIGuard: Mitigating PII Harvesting under Adversarial Sanitization

Mingshuo Liu, Yiwei Zha, Min Chen

Browsing-enabled LLM assistants can fetch webpages and answer contact-seeking queries, creating a practical channel for scraping contact-style...

1 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Revisiting JBShield: Breaking and Rebuilding Representation-Level Jailbreak Defenses

Kemal Derya, Berk Sunar

Defending large language models (LLMs) against jailbreak attacks, such as Greedy Coordinate Gradient (GCG), remains a challenge, particularly under...

1 months ago cs.CR PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial