AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1–20 of 42 papers

Clear filters

Attack MEDIUM

A Deterministic Control Plane for LLM Coding Agents

Padmaraj Madatha

LLM coding harnesses grant agents broad file and shell access, yet the configuration layer that steers them -- rules files, agent definitions,...

2 days ago cs.SE cs.AI cs.CR PDF

Attack MEDIUM

At the Edge of Understanding: Sparse Autoencoders Trace The Limits of Transformer Generalization

Praneet Suresh, Jack Stanley, Sonia Joseph +2 more

Pre-trained transformers have demonstrated remarkable generalization abilities, at times extending beyond the scope of their training data. Yet,...

2 days ago cs.LG PDF

Attack MEDIUM

TEMPO-Diffusion: Temporally Exposed Malicious Poisoning of Diffusion Models

William Aiken, Paula Branco, Guy-Vincent Jourdan +1 more

Noise-based backdoor attacks on diffusion models typically rely on input-time trigger injection, untargeted activation, and out-of-distribution...

3 days ago cs.CR cs.AI PDF

Attack MEDIUM

Detect, Unlearn, Restore: Defending Text Summarization Models Against Data Poisoning

Poojitha Thota, Shirin Nilizadeh

Training-time data poisoning during fine-tuning poses a significant threat to large language models (LLMs) deployed for abstractive text...

3 days ago cs.CL cs.CR PDF

Attack MEDIUM

The Role of Input Dimensionality in the Emergence and Targeted Control of Adversarial Examples

Nasrin Malekzadeh Goradel, Niccolo Pancino, Yaser Gholizade Atani +3 more

Several theoretical works have tried to explain the adversarial vulnerability of deep neural networks through properties of high-dimensional...

3 days ago stat.ML cs.CR cs.LG PDF

Attack MEDIUM

LLMs Prompted for Legal Context Object More: Overrefusal from Small On-Premises LLMs in Criminal Legal Context

Anastasiia Kucherenko, François Brouchoud, Dimitri Percia David +1 more

While the validity of LLMs' use in the legal context remains subject to ethical and legal debate, legal professionals are already experimenting with...

4 days ago cs.AI PDF

Attack MEDIUM

Poisoned Playbooks: Demystifying Knowledge Poisoning Effects on AI Security Agents

Juho Park, Hyunmin Choi, Kevin Nam

AI security agents increasingly rely on Retrieval-Augmented Generation (RAG) to use external security knowledge for vulnerability analysis and...

4 days ago cs.CR PDF

Attack MEDIUM

Securing LLM-Agent Long-Term Memory Against Poisoning: Non-Malleable, Origin-Bound Authority with Machine-Checked Guarantees

Yedidel Louck

LLM agents increasingly rely on persistent long-term memory, which creates a critical vulnerability that we study here: memory poisoning. An...

4 days ago cs.CR PDF

Attack MEDIUM

Pigeonholing: Bad prompts hurt models to collapse and make mistakes

Hyunji Nam, Keertana Chidambaram, Dorottya Demszky +1 more

While in-context learning is generally shown to be effective in Large Language Models (LLMs), bad contexts can cause performance degradation and mode...

4 days ago cs.CL cs.AI PDF

Attack MEDIUM

TROPT: An Open Framework for Unifying and Advancing Discrete Text Optimization

Matan Ben-Tov, Mahmood Sharif

Discrete text-trigger optimization -- searching for text sequences that, when ingested by a model, steer it toward a specified objective -- underpins...

5 days ago cs.LG cs.CR PDF

Attack MEDIUM

T-VSS: Test-Time Visual Subspace Steering for Adversarial Robustness of Vision-Language Models

Jaehyuk Jang, Minseok Seo. Seungju Cho, Kangwook Ko +1 more

Vision-language models (VLMs) achieve strong zero-shot recognition, but they remain highly vulnerable to adversarial perturbations. Recent test-time...

5 days ago cs.CV PDF

Attack MEDIUM

Heterogeneous LLM Debate Under Adversarial Peers: Honest Gains, Replacement Costs, and Resilience

Prashanti Nilayam, Kiran Kumar Ramanna, Prashil Tumbade +1 more

Heterogeneous LLM debate is motivated by the promise that diverse peers correct one another, but the same exchange that carries correction also...

1 weeks ago cs.CR cs.MA PDF

Attack MEDIUM

Analyzing the Narration Gap in LLM-Solver Loops

Zunchen Huang, Songgaojun Deng

Formal tools such as SAT and SMT solvers are increasingly embedded in language model reasoning pipelines when a safety or security critical question...

1 weeks ago cs.AI cs.CR cs.LO PDF

Attack MEDIUM

Stealthy World Model Manipulation via Data Poisoning

Yibin Hu, Xiaolin Sun, Zizhan Zheng

Model-based learning agents use learned world models to predict future states, plan actions, and adapt to new environments. However, the process of...

1 weeks ago cs.LG cs.CR cs.RO PDF

Attack MEDIUM

KVEraser: Learning to Steer KV Cache for Efficient Localized Context Erasing

Mufei Li, Shikun Liu, Dongqi Fu +5 more

Post-hoc context erasing over the KV cache is challenging because a local edit has a global consequence: once a span has been processed, its...

1 weeks ago cs.CL cs.LG PDF

Attack MEDIUM

Cross-Silo De-Anonymization Under Local Differential Privacy: Threat Model, Phase Transition, and Coordination Necessity

Ziniu Liu, Aiping Li

When a person's records appear in k independent data silos, each protected by (epsilon, delta)-differential privacy, standard composition yields a...

1 weeks ago cs.CR cs.IT cs.LG PDF

Attack MEDIUM

The Proxy Knows Too Much: Sealing LLM API Routers with Attested TEEs

Sipeng Xie, Qianhong Wu, Hengrun Lu +4 more

Agents increasingly access large language models (LLMs) through API routers. A router terminates the client's transport-layer security session and...

1 weeks ago cs.CR cs.AI cs.ET PDF

Attack MEDIUM

Gaming-Resistant Insurance Contracts for Autonomous AI Agents: Strategy-Proof Toll Mechanism Design

Hao-Hsuan Chen

Paper A defines a time-consistent actuarial runtime that prices each side-effect-bearing action against a contractually fixed safe default and gates...

1 weeks ago cs.GT cs.AI q-fin.RM PDF

Attack MEDIUM

Invisible Manipulation Channels in AI-Assisted Financial Advisory: Implications for Market Integrity and Regulatory Design

Liuyang Yao, Zhouyu Li, Junguang He +1 more

AI systems are increasingly deployed for credit assessment and investment advisory in global financial markets, yet the integrity of their inference...

1 weeks ago cs.CR PDF

Attack MEDIUM

Beyond Runtime Enforcement: Shield Synthesis as Defensibility Analysis for Adversarial Networks

Achraf Hsain, Sultan Almuhammadi

Shielded reinforcement learning is typically presented as a runtime safety mechanism that compiles temporal-logic specifications into automata...

2 weeks ago cs.AI cs.CR cs.GT PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial