AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 341–360 of 1,455 papers

Clear filters

Benchmark MEDIUM

Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms

Ari Azarafrooz

AI-agent guardrails are memoryless: each message is judged in isolation, so an adversary who spreads a single attack across dozens of sessions slips...

2 months ago cs.CR cs.AI cs.CL PDF

Benchmark MEDIUM

Residual Risk Analysis in Benign Code: How Far Are We? A Multi-Model Semantic and Structural Similarity Approach

Mohammad Farhad, Shuvalaxmi Dass

Software security relies on effective vulnerability detection and patching, yet determining whether a patch fully eliminates risk remains an...

2 months ago cs.SE cs.CR PDF

Tool MEDIUM

AVISE: Framework for Evaluating the Security of AI Systems

Mikko Lempinen, Joni Kemppainen, Niklas Raesalmi

As artificial intelligence (AI) systems are increasingly deployed across critical domains, their security vulnerabilities pose growing risks of...

2 months ago cs.CR cs.AI cs.CL PDF

Defense MEDIUM

Breaking Bad: Interpretability-Based Safety Audits of State-of-the-Art LLMs

Krishiv Agarwal, Ramneet Kaur, Colin Samplawski +6 more

Effective safety auditing of large language models (LLMs) demands tools that go beyond black-box probing and systematically uncover vulnerabilities...

2 months ago cs.CR cs.LG PDF

Benchmark MEDIUM

Anchor-and-Resume Concession Under Dynamic Pricing for LLM-Augmented Freight Negotiation

Hoang Nguyen, Lu Wang, Marta Gaia Bras

Freight brokerages negotiate thousands of carrier rates daily under dynamic pricing conditions where models frequently revise targets...

2 months ago cs.MA cs.AI cs.CL PDF

Attack MEDIUM

Auto-ART: Structured Literature Synthesis and Automated Adversarial Robustness Testing

Abhijit Talluri

Adversarial robustness evaluation underpins every claim of trustworthy ML deployment, yet the field suffers from fragmented protocols and undetected...

2 months ago cs.CR cs.LG PDF

Other MEDIUM

LayerTracer: A Joint Task-Particle and Vulnerable-Layer Analysis framework for Arbitrary Large Language Model Architectures

Yuhang Wu, Qinyuan Liu, Qiuyang Zhao +1 more

Currently, Large Language Models (LLMs) feature a diversified architectural landscape, including traditional Transformer, GateDeltaNet, and Mamba....

2 months ago cs.CL cs.AI PDF

Defense MEDIUM

SafeRedirect: Defeating Internal Safety Collapse via Task-Completion Redirection in Frontier LLMs

Chao Pan, Yu Wu, Xin Yao

Internal Safety Collapse (ISC) is a failure mode in which frontier LLMs, when executing legitimate professional tasks whose correct completion...

2 months ago cs.CR cs.AI cs.LG PDF

Benchmark MEDIUM

Towards Secure Logging: Characterizing and Benchmarking Logging Code Security Issues with LLMs

He Yang Yuan, Xin Wang, Kundi Yao +3 more

Logging code plays an important role in software systems by recording key events and behaviors, which are essential for debugging and monitoring....

2 months ago cs.SE cs.AI cs.CR PDF

Benchmark MEDIUM

Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages

Girish, Mohd Mujtaba Akhtar, Orchid Chetia Phukan +1 more

The rapid advancement of Audio Large Language Models (ALMs), driven by Neural Audio Codecs (NACs), has led to the emergence of highly realistic...

2 months ago eess.AS PDF

Benchmark MEDIUM

An AI Agent Execution Environment to Safeguard User Data

Robert Stanley, Avi Verma, Lillian Tsai +2 more

AI agents promise to serve as general-purpose personal assistants for their users, which requires them to have access to private user data (e.g.,...

2 months ago cs.CR cs.AI cs.OS PDF

Benchmark MEDIUM

Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps

Alankrit Chona, Igor Kozlov, Ambuj Kumar

We introduce the Cyber Defense Benchmark, a benchmark for measuring how well large language model (LLM) agents perform the core SOC analyst task of...

2 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps

Alankrit Chona, Igor Kozlov, Ambuj Kumar

We introduce the Cyber Defense Benchmark, a benchmark for measuring how well large language model (LLM) agents perform the core SOC analyst task of...

2 months ago cs.CR cs.AI PDF

Defense MEDIUM

Evaluating LLM-Generated Obfuscated XSS Payloads for Machine Learning-Based Detection

Divyesh Gabbireddy, Suman Saha

Cross-site scripting (XSS) remains a persistent web security vulnerability, especially because obfuscation can change the surface form of a malicious...

2 months ago cs.CR cs.LG cs.SE PDF

Defense MEDIUM

Malicious ML Model Detection by Learning Dynamic Behaviors

Sarang Nambiar, Dhruv Pradhan, Ezekiel Soremekun

Pre-trained machine learning models (PTMs) are commonly provided via Model Hubs (e.g., Hugging Face) in standard formats like Pickles to facilitate...

2 months ago cs.CR cs.SE PDF

Benchmark MEDIUM

Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture The Flag Challenges

Ali Al-Kaswan, Maksim Plotnikov, Maxim Hájek +3 more

Large Language Model (LLM) agents are increasingly proposed for autonomous cybersecurity tasks, but their capabilities in realistic offensive...

2 months ago cs.AI cs.CR cs.SE PDF

Defense MEDIUM

ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety

Kun Wang, Cheng Qian, Miao Yu +6 more

Multimodal Large Language Models (MLLMs) have achieved remarkable success in cross-modal understanding and generation, yet their deployment is...

2 months ago cs.CR cs.AI PDF

Defense MEDIUM

Mechanistic Anomaly Detection via Functional Attribution

Hugo Lyons Keenan, Christopher Leckie, Sarah Erfani

We can often verify the correctness of neural network outputs using ground truth labels, but we cannot reliably determine whether the output was...

2 months ago cs.LG cs.CR PDF

Benchmark MEDIUM

Towards Understanding the Robustness of Sparse Autoencoders

Ahson Saiyed, Sabrina Sadiekh, Chirag Agarwal

Large Language Models (LLMs) remain vulnerable to optimization-based jailbreak attacks that exploit internal gradient structure. While Sparse...

2 months ago cs.LG cs.AI cs.CL PDF

Attack MEDIUM

Beyond Indistinguishability: Measuring Extraction Risk in LLM APIs

Ruixuan Liu, David Evans, Li Xiong

Indistinguishability properties such as differential privacy bounds or low empirically measured membership inference are widely treated as proxies to...

2 months ago cs.CR cs.CL cs.LG PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial