AI Security Research

AI Threat Alert indexes 3,082+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,082
Attack

1,196
Benchmark

883
Defense

421
Tool

321
Survey

181

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1481–1500 of 3,082 papers

Survey MEDIUM

Following Dragons: Code Review-Guided Fuzzing

Viet Hoang Luu, Amirmohammad Pasdar, Wachiraphan Charoenwet +3 more

Modern fuzzers scale to large, real-world software but often fail to exercise the program states developers consider most fragile or...

4 months ago cs.CR cs.SE PDF

Benchmark MEDIUM

Protecting Context and Prompts: Deterministic Security for Non-Deterministic AI

Mohan Rajagopalan, Vinay Rao

Large Language Model (LLM) applications are vulnerable to prompt injection and context manipulation attacks that traditional security models cannot...

4 months ago cs.CR cs.AI cs.MA PDF

Survey MEDIUM

Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations

Ashwath Vaithinathan Aravindan, Mayank Kejriwal

Chain-of-Thought (CoT) prompting has emerged as a foundational technique for eliciting reasoning from Large Language Models (LLMs), yet the...

4 months ago cs.CL cs.AI cs.LG PDF

Survey HIGH

The Landscape of Prompt Injection Threats in LLM Agents: From Taxonomy to Analysis

Peiran Wang, Xinfeng Li, Chong Xiang +5 more

The evolution of Large Language Models (LLMs) has resulted in a paradigm shift towards autonomous agents, necessitating robust security against...

4 months ago cs.CR cs.CL PDF

Benchmark LOW

HII-DPO: Eliminate Hallucination via Accurate Hallucination-Inducing Counterfactual Images

Yilin Yang, Zhenghui Guo, Yuke Wang +3 more

Large Vision-Language Models (VLMs) have achieved remarkable success across diverse multimodal tasks but remain vulnerable to hallucinations rooted...

4 months ago cs.CV PDF

Defense MEDIUM

SecCodePRM: A Process Reward Model for Code Security

Weichen Yu, Ravi Mangal, Yinyi Luo +4 more

Large Language Models are rapidly becoming core components of modern software development workflows, yet ensuring code security remains challenging....

4 months ago cs.CR cs.SE PDF

Attack HIGH

Detecting Jailbreak Attempts in Clinical Training LLMs Through Automated Linguistic Feature Extraction

Tri Nguyen, Huy Hoang Bao Le, Lohith Srikanth Pentapalli +2 more

Detecting jailbreak attempts in clinical training large language models (LLMs) requires accurate modeling of linguistic deviations that signal unsafe...

4 months ago cs.AI cs.LG PDF

Benchmark HIGH

Red Teaming LLMs as Socio-Technical Practice: From Exploration and Data Creation to Evaluation

Adriana Alvarado Garcia, Ruyuan Wan, Ozioma C. Oguine +1 more

Recently, red teaming, with roots in security, has become a key evaluative approach to ensure the safety and reliability of Generative Artificial...

4 months ago cs.CY cs.AI cs.CL PDF

Survey HIGH

QRS: A Rule-Synthesizing Neuro-Symbolic Triad for Autonomous Vulnerability Discovery

George Tsigkourakos, Constantinos Patsakis

Static Application Security Testing (SAST) tools are integral to modern DevSecOps pipelines, yet tools like CodeQL, Semgrep, and SonarQube remain...

4 months ago cs.CR PDF

Defense LOW

The Hidden Costs of Domain Fine-Tuning: Pii-Bearing Data Degrades Safety and Increases Leakage

Jayesh Choudhari, Piyush Kumar Singh

Domain fine-tuning is a common path to deploy small instruction-tuned language models as customer-support assistants, yet its effects on...

4 months ago cs.CR cs.LG PDF

Tool HIGH

Stop Testing Attacks, Start Diagnosing Defenses: The Four-Checkpoint Framework Reveals Where LLM Safety Breaks

Hayfa Dhabhi, Kashyap Thimmaraju

Large Language Models (LLMs) deploy safety mechanisms to prevent harmful outputs, yet these defenses remain vulnerable to adversarial prompts. While...

4 months ago cs.CR cs.AI cs.CY PDF

Defense MEDIUM

Omni-Safety under Cross-Modality Conflict: Vulnerabilities, Dynamics Mechanisms and Efficient Alignment

Kun Wang, Zherui Li, Zhenhong Zhou +8 more

Omni-modal Large Language Models (OLLMs) greatly expand LLMs' multimodal capabilities but also introduce cross-modal safety risks. However, a...

4 months ago cs.CR cs.AI cs.CL PDF

Attack MEDIUM

A Behavioral Fingerprint for Large Language Models: Provenance Tracking via Refusal Vectors

Zhenyu Xu, Victor S. Sheng

Protecting the intellectual property of large language models (LLMs) is a critical challenge due to the proliferation of unauthorized derivative...

4 months ago cs.CR cs.AI PDF

Tool MEDIUM

Autonomous Action Runtime Management(AARM):A System Specification for Securing AI-Driven Actions at Runtime

Herman Errico

As artificial intelligence systems evolve from passive assistants into autonomous agents capable of executing consequential actions, the security...

4 months ago cs.CR cs.AI PDF

Benchmark LOW

Reward Modeling for Reinforcement Learning-Based LLM Reasoning: Design, Challenges, and Evaluation

Pei-Chi Pan, Yingbin Liang, Sen Lin

Large Language Models (LLMs) demonstrate transformative potential, yet their reasoning remains inconsistent and unreliable. Reinforcement learning...

4 months ago cs.LG PDF

Benchmark HIGH

CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation

Chaeyun Kim, YongTaek Lim, Kihyun Kim +2 more

Existing red-teaming benchmarks, when adapted to new languages via direct translation, fail to capture socio-technical vulnerabilities rooted in...

4 months ago cs.CY cs.AI PDF

Attack HIGH

MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

Georgios Syros, Evan Rose, Brian Grinstead +4 more

Large language model (LLM) based web agents are increasingly deployed to automate complex online tasks by directly interacting with web sites and...

4 months ago cs.CR cs.AI PDF

Attack HIGH

One RNG to Rule Them All: How Randomness Becomes an Attack Vector in Machine Learning

Kotekar Annapoorna Prabhu, Andrew Gan, Zahra Ghodsi

Machine learning relies on randomness as a fundamental component in various steps such as data sampling, data augmentation, weight initialization,...

4 months ago cs.CR cs.LG PDF

Benchmark LOW

Basic Legibility Protocols Improve Trusted Monitoring

Ashwin Sreevatsa, Sebastian Prasanna, Cody Rushing

The AI Control research agenda aims to develop control protocols: safety techniques that prevent untrusted AI systems from taking harmful actions...

4 months ago cs.CR cs.LG cs.SE PDF

Benchmark MEDIUM

When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents

Yuting Ning, Jaylen Jones, Zhehao Zhang +5 more

Computer-use agents (CUAs) have made tremendous progress in the past year, yet they still frequently produce misaligned actions that deviate from the...

4 months ago cs.CL PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,082+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial