AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 241–260 of 521 papers

Clear filters

Benchmark MEDIUM

Protecting Context and Prompts: Deterministic Security for Non-Deterministic AI

Mohan Rajagopalan, Vinay Rao

Large Language Model (LLM) applications are vulnerable to prompt injection and context manipulation attacks that traditional security models cannot...

4 months ago cs.CR cs.AI cs.MA PDF

Benchmark MEDIUM

When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents

Yuting Ning, Jaylen Jones, Zhehao Zhang +5 more

Computer-use agents (CUAs) have made tremendous progress in the past year, yet they still frequently produce misaligned actions that deviate from the...

4 months ago cs.CL PDF

Benchmark MEDIUM

When Evaluation Becomes a Side Channel: Regime Leakage and Structural Mitigations for Alignment Assessment

Igor Santos-Grueiro

Safety evaluation for advanced AI systems assumes that behavior observed under evaluation predicts behavior in deployment. This assumption weakens...

4 months ago cs.AI cs.CR cs.LG PDF

Benchmark MEDIUM

RIFLE: Robust Distillation-based FL for Deep Model Deployment on Resource-Constrained IoT Networks

Pouria Arefijamal, Mahdi Ahmadlou, Bardia Safaei +1 more

Federated learning (FL) is a decentralized learning paradigm widely adopted in resource-constrained Internet of Things (IoT) environments. These...

4 months ago cs.LG cs.CR cs.DC PDF

Benchmark MEDIUM

On Protecting Agentic Systems' Intellectual Property via Watermarking

Liwen Wang, Zongjie Li, Yuchong Xie +4 more

The evolution of Large Language Models (LLMs) into agentic systems that perform autonomous reasoning and tool use has created significant...

4 months ago cs.AI cs.CR PDF

Benchmark MEDIUM

Moral Sycophancy in Vision Language Models

Shadman Rabby, Md. Hefzul Hossain Papon, Sabbir Ahmed +3 more

Sycophancy in Vision-Language Models (VLMs) refers to their tendency to align with user opinions, often at the expense of moral or factual accuracy....

4 months ago cs.AI PDF

Benchmark MEDIUM

Agent-Fence: Mapping Security Vulnerabilities Across Deep Research Agents

Sai Puppala, Ismail Hossain, Md Jahangir Alam +5 more

Large language models are increasingly deployed as *deep agents* that plan, maintain persistent state, and invoke external tools, shifting safety...

4 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

NAAMSE: Framework for Evolutionary Security Evaluation of Agents

Kunal Pai, Parth Shah, Harshil Patel

AI agents are increasingly deployed in production, yet their security evaluations remain bottlenecked by manual red-teaming or static benchmarks that...

4 months ago cs.AI cs.MA PDF

Benchmark MEDIUM

Aegis: Towards Governance, Integrity, and Security of AI Voice Agents

Xiang Li, Pin-Yu Chen, Wenqi Wei

With the rapid advancement and adoption of Audio Large Language Models (ALLMs), voice agents are now being deployed in high-stakes domains such as...

4 months ago cs.CR cs.MA PDF

Benchmark MEDIUM

Beyond Crash: Hijacking Your Autonomous Vehicle for Fun and Profit

Qi Sun, Ahmed Abdo, Luis Burbano +4 more

Autonomous Vehicles (AVs), especially vision-based AVs, are rapidly being deployed without human operators. As AVs operate in safety-critical...

4 months ago cs.CR cs.LG PDF

Benchmark MEDIUM

Confundo: Learning to Generate Robust Poison for Practical RAG Systems

Haoyang Hu, Zhejun Jiang, Yueming Lyu +3 more

Retrieval-augmented generation (RAG) is increasingly deployed in real-world applications, where its reference-grounded design makes outputs appear...

4 months ago cs.CR cs.LG PDF

Benchmark MEDIUM

Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study

Yi Liu, Zhihao Chen, Yanjun Zhang +5 more

Third-party agent skills extend LLM-based agents with instruction files and executable code that run on users' machines. Skills execute with user...

4 months ago cs.CR cs.AI cs.CL PDF

Benchmark MEDIUM

Steering Safely or Off a Cliff? Rethinking Specificity and Robustness in Inference-Time Interventions

Navita Goyal, Hal Daumé

Model steering, which involves intervening on hidden representations at inference time, has emerged as a lightweight alternative to finetuning for...

4 months ago cs.LG cs.AI cs.CL PDF

Benchmark MEDIUM

Private and interpretable clinical prediction with quantum-inspired tensor train models

José Ramón Pareja Monturiol, Juliette Sinnott, Roger G. Melko +1 more

Machine learning in clinical settings must balance predictive accuracy, interpretability, and privacy. Models such as logistic regression (LR) offer...

4 months ago cs.LG cs.CR quant-ph PDF

Benchmark MEDIUM

Do Vision-Language Models Respect Contextual Integrity in Location Disclosure?

Ruixin Yang, Ethan Mendes, Arthur Wang +4 more

Vision-language models (VLMs) have demonstrated strong performance in image geolocation, a capability further sharpened by frontier multimodal large...

4 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Alignment Drift in Multimodal LLMs: A Two-Phase, Longitudinal Evaluation of Harm Across Eight Model Releases

Casey Ford, Madison Van Doren, Emily Dix

Multimodal large language models (MLLMs) are increasingly deployed in real-world systems, yet their safety under adversarial prompting remains...

4 months ago cs.CL cs.AI cs.HC PDF

Benchmark MEDIUM

Trust The Typical

Debargha Ganguly, Sreehari Sankar, Biyao Zhang +8 more

Current approaches to LLM safety fundamentally rely on a brittle cat-and-mouse game of identifying and blocking known threats via guardrails. We...

4 months ago cs.CL cs.AI cs.DC PDF

Benchmark MEDIUM

SalamahBench: Toward Standardized Safety Evaluation for Arabic Language Models

Omar Abdelnasser, Fatemah Alharbi, Khaled Khasawneh +2 more

Safety alignment in Language Models (LMs) is fundamental for trustworthy AI. However, while different stakeholders are trying to leverage Arabic...

4 months ago cs.CL cs.AI PDF

Benchmark MEDIUM

Extracting Recurring Vulnerabilities from Black-Box LLM-Generated Software

Tomer Kordonsky, Maayan Yamin, Noam Benzimra +2 more

LLMs are increasingly used for code generation, but their outputs often follow recurring templates that can induce predictable vulnerabilities. We...

4 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Benchmarking Large Language Models for Zero-shot and Few-shot Phishing URL Detection

Najmul Hasan, Prashanth BusiReddyGari

The Uniform Resource Locator (URL), introduced in a connectivity-first era to define access and locate resources, remains historically limited,...

4 months ago cs.CR cs.AI PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial