AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 401–420 of 1,455 papers

Clear filters

Benchmark MEDIUM

OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models

Xiaomeng Hu, Yinger Zhang, Fei Huang +7 more

AI agents are expected to perform professional work across hundreds of occupational domains (from emergency department triage to nuclear reactor...

2 months ago cs.CL PDF

Benchmark MEDIUM

DuCodeMark: Dual-Purpose Code Dataset Watermarking via Style-Aware Watermark-Poison Design

Yuchen Chen, Yuan Xiao, Chunrong Fang +2 more

The proliferation of large language models for code (CodeLMs) and open-source contributions has heightened concerns over unauthorized use of source...

2 months ago cs.CR PDF

Defense MEDIUM

The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents

Xuwei Ding, Skylar Zhai, Linxin Song +6 more

Computer-use agents (CUAs) can now autonomously complete complex tasks in real digital environments, but when misled, they can also be used to...

2 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing

Wenhao Yuan, Chenchen Lin, Jian Chen +3 more

In large language model (LLM) agents, reasoning trajectories are treated as reliable internal beliefs for guiding actions and updating memory....

2 months ago cs.AI cs.CL PDF

Attack MEDIUM

Phantasia: Context-Adaptive Backdoors in Vision Language Models

Nam Duong Tran, Phi Le Nguyen

Recent advances in Vision-Language Models (VLMs) have greatly enhanced the integration of visual perception and linguistic reasoning, driving rapid...

2 months ago cs.CV cs.AI PDF

Attack MEDIUM

Security Concerns in Generative AI Coding Assistants: Insights from Online Discussions on GitHub Copilot

Nicolás E. Díaz Ferreyra, Monika Swetha Gurupathi, Zadia Codabux +2 more

Generative Artificial Intelligence (GenAI) has become a central component of many development tools (e.g., GitHub Copilot) that support software...

2 months ago cs.SE cs.CR cs.HC PDF

Defense MEDIUM

Towards Identification and Intervention of Safety-Critical Parameters in Large Language Models

Weiwei Qi, Zefeng Wu, Tianhang Zheng +4 more

Ensuring Large Language Model (LLM) safety is crucial, yet the lack of a clear understanding about safety mechanisms hinders the development of...

2 months ago cs.CR PDF

Attack MEDIUM

TADP-RME: A Trust-Adaptive Differential Privacy Framework for Enhancing Reliability of Data-Driven Systems

Labani Halder, Payel Sadhukhan, Sarbani Palit

Ensuring reliability in adversarial settings necessitates treating privacy as a foundational component of data-driven systems. While differential...

2 months ago cs.CR cs.AI cs.LG PDF

Defense MEDIUM

The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training

Rui Zhang, Hongwei Li, Yun Shen +6 more

The deployment of large language models (LLMs) raises significant ethical and safety concerns. While LLM alignment techniques are adopted to improve...

2 months ago cs.CR cs.CL PDF

Benchmark MEDIUM

ADAG: Automatically Describing Attribution Graphs

Aryaman Arora, Zhengxuan Wu, Jacob Steinhardt +1 more

In language model interpretability research, \textbf{circuit tracing} aims to identify which internal features causally contributed to a particular...

2 months ago cs.CL PDF

Survey MEDIUM

MCP-DPT: A Defense-Placement Taxonomy and Coverage Analysis for Model Context Protocol Security

Mehrdad Rostamzadeh, Sidhant Narula, Nahom Birhan +2 more

The Model Context Protocol (MCP) enables large language models (LLMs) to dynamically discover and invoke third-party tools, significantly expanding...

2 months ago cs.CR cs.AI PDF

Tool MEDIUM

TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation

Hengkai Ye, Zhechang Zhang, Jinyuan Jia +1 more

Large language models (LLMs) increasingly rely on external tools to perform time-sensitive tasks and real-world actions. While tool integration...

2 months ago cs.CR PDF

Benchmark MEDIUM

ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training

Yu Liang, Liangxin Liu, Longzheng Wang +5 more

Generative reward models (GRMs) have emerged as a promising approach for aligning Large Language Models (LLMs) with human preferences by offering...

2 months ago cs.AI cs.CL cs.LG PDF

Benchmark MEDIUM

Validated Intent Compilation for Constrained Routing in LEO Mega-Constellations

Yuanhang Li

Operating LEO mega-constellations requires translating high-level operator intents ("reroute financial traffic away from polar links under 80 ms")...

2 months ago cs.CR cs.AI PDF

Tool MEDIUM

TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories

Yen-Shan Chen, Sian-Yao Huang, Cheng-Lin Yang +1 more

As large language models (LLMs) evolve from static chatbots into autonomous agents, the primary vulnerability surface shifts from final outputs to...

2 months ago cs.CR cs.AI cs.CL PDF

Attack MEDIUM

Evaluating PQC KEMs, Combiners, and Cascade Encryption via Adaptive IND-CPA Testing Using Deep Learning

Simon Calderon, Niklas Johansson, Onur Günlü

Ensuring ciphertext indistinguishability is fundamental to cryptographic security, but empirically validating this property in real implementations...

2 months ago cs.CR cs.IT cs.LG PDF

Defense MEDIUM

SentinelSphere: Integrating AI-Powered Real-Time Threat Detection with Cybersecurity Awareness Training

Nikolaos D. Tantaroudas, Ilias Karachalios, Andrew J. McCracken

The field of cybersecurity is confronted with two interrelated challenges: a worldwide deficit of qualified practitioners and ongoing human-factor...

2 months ago cs.CE cs.AI cs.CR PDF

Tool MEDIUM

SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills

Yinghan Hou, Zongyou Yang

OpenClaw's ClawHub marketplace hosts over 13,000 community-contributed agent skills, and between 13% and 26% of them contain security vulnerabilities...

2 months ago cs.CR cs.AI PDF

Defense MEDIUM

VLMShield: Efficient and Robust Defense of Vision-Language Models against Malicious Prompts

Peigui Qi, Kunsheng Tang, Yanpu Yu +7 more

Vision-Language Models (VLMs) face significant safety vulnerabilities from malicious prompt attacks due to weakened alignment during visual...

2 months ago cs.LG PDF

Attack MEDIUM

Exclusive Unlearning

Mutsumi Sasaki, Kouta Nakayama, Yusuke Miyao +2 more

When introducing Large Language Models (LLMs) into industrial applications, such as healthcare and education, the risk of generating harmful content...

2 months ago cs.CL PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial