AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 21–40 of 950 papers

Clear filters

Benchmark MEDIUM

PointVG-R: Internalizing Geometric Reasoning in MLLMs for Precise Pointing Localization via Visual Chain of Thought

Ling Li, Bowen Liu, Zinuo Zhan +5 more

Pointing-based visual grounding requires models to precisely locate target objects by deciphering complex spatial relationships between the visual...

4 days ago cs.CV PDF

Defense MEDIUM

Poster: Exploring the Limits of Audio-Based Detection of Turkish Phone Call Scams

Arda Eren, Micheal Cheung, Youqian Zhang +2 more

Scam phone calls exploit vulnerable communities worldwide, yet research on detection has focused almost exclusively on English and other...

4 days ago cs.CL cs.AI PDF

Attack MEDIUM

Poisoned Playbooks: Demystifying Knowledge Poisoning Effects on AI Security Agents

Juho Park, Hyunmin Choi, Kevin Nam

AI security agents increasingly rely on Retrieval-Augmented Generation (RAG) to use external security knowledge for vulnerability analysis and...

4 days ago cs.CR PDF

Attack MEDIUM

Securing LLM-Agent Long-Term Memory Against Poisoning: Non-Malleable, Origin-Bound Authority with Machine-Checked Guarantees

Yedidel Louck

LLM agents increasingly rely on persistent long-term memory, which creates a critical vulnerability that we study here: memory poisoning. An...

4 days ago cs.CR PDF

Attack MEDIUM

Pigeonholing: Bad prompts hurt models to collapse and make mistakes

Hyunji Nam, Keertana Chidambaram, Dorottya Demszky +1 more

While in-context learning is generally shown to be effective in Large Language Models (LLMs), bad contexts can cause performance degradation and mode...

4 days ago cs.CL cs.AI PDF

Benchmark MEDIUM

Distributed Quality-Diversity Search for Toxicity in Large Language Models

Onkar Shelar, Travis Desell

Large Language Models remain vulnerable to adversarial prompts that elicit harmful responses, and scaling red-teaming to cover a broad range of...

4 days ago cs.NE PDF

Survey MEDIUM

One Year Later...The Harms Persist, But So Do We!

Annika Marie Schoene, Cansu Canca, Gautham Vijay Kumar +1 more

General-purpose large language models (LLMs) are increasingly used for mental health-related conversations, yet safety safeguards remain inadequate...

5 days ago cs.CL cs.AI PDF

Attack MEDIUM

TROPT: An Open Framework for Unifying and Advancing Discrete Text Optimization

Matan Ben-Tov, Mahmood Sharif

Discrete text-trigger optimization -- searching for text sequences that, when ingested by a model, steer it toward a specified objective -- underpins...

5 days ago cs.LG cs.CR PDF

Benchmark MEDIUM

Detecting Malicious Agent Skills in the Wild using Attention

Bacem Etteib, Daniele Lunghi, Tégawendé F. Bissyandé

LLM agents increasingly load skills, file-based packages of natural-language instructions written by third parties and distributed through...

5 days ago cs.CR cs.AI PDF

Tool MEDIUM

FlexServe: A Fast and Secure LLM Serving System for Mobile Devices with Flexible Resource Isolation

Yinpeng Wu, Yitong Chen, Lixiang Wang +3 more

Device-side Large Language Models (LLMs) have grown explosively, offering stronger privacy and higher availability than their cloud-side...

5 days ago cs.CR cs.LG cs.OS PDF

Benchmark MEDIUM

TooBad: Backdoor Diffusion Models with Ultra-Low Poison Rate and Imperceptible Trigger

Vu Tuan Truong, Long Bao Le

Diffusion models (DMs), despite their impressive capabilities across a wide range of generative tasks, have been shown to be vulnerable to backdoor...

5 days ago cs.CR cs.CV PDF

Benchmark MEDIUM

GIF: Locally Sound Geometric Information Flow Control for LLMs

Adam Storek, Nikolaus Holzer, Zhuo Zhang +1 more

Large language models increasingly mediate interactions between sensitive data, untrusted inputs, and privileged actions in agentic systems, creating...

5 days ago cs.AI PDF

Benchmark MEDIUM

Exposing the Illusion of Erasure in Knowledge Editing for LLMs

Advik Raj Basani, Anshuman Chhabra

Knowledge Editing (KE) has emerged as a frontier for updating specific facts in LLMs without costly retraining, but its reliability and underlying...

5 days ago cs.LG cs.AI cs.CR PDF

Other MEDIUM

Memory Contagion: Cross-Temporal Propagation of Evaluator Bias via Agent Memory

Zewen Liu

Large Language Model (LLM) agents increasingly rely on memory systems to maintain long-term coherence. Recent work shows that agent memories degrade...

5 days ago cs.LG cs.AI cs.CL PDF

Attack MEDIUM

T-VSS: Test-Time Visual Subspace Steering for Adversarial Robustness of Vision-Language Models

Jaehyuk Jang, Minseok Seo. Seungju Cho, Kangwook Ko +1 more

Vision-language models (VLMs) achieve strong zero-shot recognition, but they remain highly vulnerable to adversarial perturbations. Recent test-time...

5 days ago cs.CV PDF

Survey MEDIUM

Understanding the (In)Security of Vibe-Coded Applications

Junquan Deng, Zhiyu Fan, Ruijie Meng

Recent advances in large language models (LLMs) have enabled vibe coding, an emerging software development paradigm in which users create...

5 days ago cs.CR cs.SE PDF

Survey MEDIUM

Understanding the (In)Security of Vibe-Coded Applications

Junquan Deng, Zhiyu Fan, Ruijie Meng

Recent advances in large language models (LLMs) have enabled vibe coding, an emerging software development paradigm in which users create...

5 days ago cs.CR cs.SE PDF

Tool MEDIUM

Safety in Self-Evolving LLM Agent Systems: Threats, Amplification, and Case Studies

Ruixiao Lin, Xinhao Deng, Qingming Li +12 more

Self-evolving LLM agent systems, which autonomously update their model parameters, memory, tools, and architectures, introduce a qualitatively new...

5 days ago cs.CR cs.AI PDF

Defense MEDIUM

CITADEL: CSI-Based Jamming Detection and Open-Set Classification for IIoT Networks

Aymen Bouferroum, Ildi Alla, Valeria Loscri +2 more

Radio frequency jamming poses a critical threat to the availability of wireless Industrial Internet of Things (IIoT) networks. Existing detection and...

5 days ago cs.CR cs.LG cs.NI PDF

Benchmark MEDIUM

IndicGuard: A Multilingual Safety Guard Model and Dataset for Indic Languages

Parth Bramhecha, Smit Deshmukh, Sairaj Bodhale +2 more

As Large Language Models (LLMs) achieve widespread integration across diverse linguistic landscapes, ensuring their safety and alignment with...

5 days ago cs.CL cs.LG PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial