AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 701–720 of 866 papers

Clear filters

Benchmark MEDIUM

On Stealing Graph Neural Network Models

Marcin Podhajski, Jan Dubiński, Franziska Boenisch +3 more

Current graph neural network (GNN) model-stealing methods rely heavily on queries to the victim model, assuming no hard query limits. However, in...

7 months ago cs.LG cs.CR PDF

Benchmark MEDIUM

EduGuardBench: A Holistic Benchmark for Evaluating the Pedagogical Fidelity and Adversarial Safety of LLMs as Simulated Teachers

Yilin Jiang, Mingzi Zhang, Xuanyu Yin +5 more

Large Language Models for Simulating Professions (SP-LLMs), particularly as teachers, are pivotal for personalized education. However, ensuring their...

7 months ago cs.CL PDF

Benchmark MEDIUM

Sensitivity of Small Language Models to Fine-tuning Data Contamination

Nicy Scaria, Silvester John Joseph Kennedy, Deepak Subramani

Small Language Models (SLMs) are increasingly being deployed in resource-constrained environments, yet their behavioral robustness to data...

7 months ago cs.CL cs.AI PDF

Benchmark MEDIUM

Efficient LLM Safety Evaluation through Multi-Agent Debate

Dachuan Lin, Guobin Shen, Zihao Yang +3 more

Safety evaluation of large language models (LLMs) increasingly relies on LLM-as-a-judge pipelines, but strong judges can still be expensive to use at...

7 months ago cs.AI cs.CR PDF

Benchmark LOW

Secu-Table: a Comprehensive security table dataset for evaluating semantic table interpretation systems

Azanzi Jiomekong, Jean Bikim, Patricia Negoue +1 more

Evaluating semantic tables interpretation (STI) systems, (particularly, those based on Large Language Models- LLMs) especially in domain-specific...

7 months ago cs.AI PDF

Benchmark MEDIUM

ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations

Amr Gomaa, Ahmed Salem, Sahar Abdelnabi

As language models evolve into autonomous agents that act and communicate on behalf of users, ensuring safety in multi-agent ecosystems becomes a...

7 months ago cs.CR cs.CL cs.CY PDF

Benchmark MEDIUM

TAMAS: Benchmarking Adversarial Risks in Multi-Agent LLM Systems

Ishan Kavathekar, Hemang Jain, Ameya Rathod +2 more

Large Language Models (LLMs) have demonstrated strong capabilities as autonomous agents through tool use, planning, and decision-making abilities,...

7 months ago cs.MA cs.AI PDF

Benchmark MEDIUM

Leak@$k$: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding

Hadi Reisizadeh, Jiajun Ruan, Yiwei Chen +3 more

Unlearning in large language models (LLMs) is critical for regulatory compliance and for building ethical generative AI systems that avoid producing...

7 months ago cs.LG PDF

Benchmark MEDIUM

From Model to Breach: Towards Actionable LLM-Generated Vulnerabilities Reporting

Cyril Vallez, Alexander Sternfeld, Andrei Kucharavy +1 more

As the role of Large Language Models (LLM)-based coding assistants in software development becomes more critical, so does the role of the bugs they...

7 months ago cs.CL PDF

Benchmark MEDIUM

Hybrid Fuzzing with LLM-Guided Input Mutation and Semantic Feedback

Shiyin Lin

Software fuzzing has become a cornerstone in automated vulnerability discovery, yet existing mutation strategies often lack semantic awareness,...

7 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Evaluating Control Protocols for Untrusted AI Agents

Jon Kutasov, Chloe Loughridge, Yuqi Sun +4 more

As AI systems become more capable and widely deployed as agents, ensuring their safe operation becomes critical. AI control offers one approach to...

7 months ago cs.AI PDF

Benchmark MEDIUM

On The Dangers of Poisoned LLMs In Security Automation

Patrick Karlsen, Even Eilertsen

This paper investigates some of the risks introduced by "LLM poisoning," the intentional or unintentional introduction of malicious or biased data...

7 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

ConneX: Automatically Resolving Transaction Opacity of Cross-Chain Bridges for Security Analysis

Hanzhong Liang, Yue Duan, Xing Su +5 more

As the Web3 ecosystem evolves toward a multi-chain architecture, cross-chain bridges have become critical infrastructure for enabling...

7 months ago cs.CR PDF

Benchmark LOW

Lares: LLM-driven Code Slice Semantic Search for Patch Presence Testing

Siyuan Li, Yaowen Zheng, Hong Li +7 more

In modern software ecosystems, 1-day vulnerabilities pose significant security risks due to extensive code reuse. Identifying vulnerable functions in...

7 months ago cs.SE PDF

Benchmark MEDIUM

Exploring and Mitigating Gender Bias in Encoder-Based Transformer Models

Ariyan Hossain, Khondokar Mohammad Ahanaf Hannan, Rakinul Haque +4 more

Gender bias in language models has gained increasing attention in the field of natural language processing. Encoder-based transformer models, which...

7 months ago cs.CL PDF

Benchmark MEDIUM

Self-HarmLLM: Can Large Language Model Harm Itself?

Heehwan Kim, Sungjune Park, Daeseon Choi

Large Language Models (LLMs) are generally equipped with guardrails to block the generation of harmful responses. However, existing defenses always...

8 months ago cs.CL cs.AI PDF

Benchmark MEDIUM

Adapting Large Language Models to Emerging Cybersecurity using Retrieval Augmented Generation

Arnabh Borah, Md Tanvirul Alam, Nidhi Rastogi

Security applications are increasingly relying on large language models (LLMs) for cyber threat detection; however, their opaque reasoning often...

8 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Reasoning Up the Instruction Ladder for Controllable Language Models

Zishuo Zheng, Vidhisha Balachandran, Chan Young Park +2 more

As large language model (LLM) based systems take on high-stakes roles in real-world decision-making, they must reconcile competing instructions from...

8 months ago cs.CL cs.AI PDF

Benchmark LOW

Using Copilot Agent Mode to Automate Library Migration: A Quantitative Assessment

Aylton Almeida, Laerte Xavier, Marco Tulio Valente

Keeping software systems up to date is essential to avoid technical debt, security vulnerabilities, and the rigidity typical of legacy systems....

8 months ago cs.SE PDF

Benchmark MEDIUM

Broken-Token: Filtering Obfuscated Prompts by Counting Characters-Per-Token

Shaked Zychlinski, Yuval Kainan

Large Language Models (LLMs) are susceptible to jailbreak attacks where malicious prompts are disguised using ciphers and character-level encodings...

8 months ago cs.CR cs.AI cs.CL PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial