AI Security Research

AI Threat Alert indexes 3,082+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,082
Attack

1,196
Benchmark

883
Defense

421
Tool

321
Survey

181

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1061–1080 of 3,082 papers

Benchmark MEDIUM

VeriGrey: Greybox Agent Validation

Yuntong Zhang, Sungmin Kang, Ruijie Meng +2 more

Agentic AI has been a topic of great interest recently. A Large Language Model (LLM) agent involves one or more LLMs in the back-end. In the front...

3 months ago cs.AI PDF

Attack MEDIUM

Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare

Saikat Maiti

Autonomous AI agents powered by large language models are being deployed in production with capabilities including shell execution, file system...

3 months ago cs.CR cs.AI PDF

Survey MEDIUM

Is Your LLM-as-a-Recommender Agent Trustable? LLMs' Recommendation is Easily Hacked by Biases (Preferences)

Zichen Tang, Zirui Zhang, Qian Wang +3 more

Current Large Language Models (LLMs) are gradually exploited in practically valuable agentic workflows such as Deep Research, E-commerce...

3 months ago cs.CY cs.MA PDF

Survey MEDIUM

Is Your LLM-as-a-Recommender Agent Trustable? LLMs' Recommendation is Easily Hacked by Biases (Preferences)

Zichen Tang, Zirui Zhang, Qian Wang +3 more

Current Large Language Models (LLMs) are gradually exploited in practically valuable agentic workflows such as Deep Research, E-commerce...

3 months ago cs.CY cs.MA PDF

Attack HIGH

Understanding and Defending VLM Jailbreaks via Jailbreak-Related Representation Shift

Zhihua Wei, Qiang Li, Jian Ruan +4 more

Large vision-language models (VLMs) often exhibit weakened safety alignment with the integration of the visual modality. Even when text prompts...

3 months ago cs.CV cs.AI PDF

Benchmark LOW

InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning

Chengwei Wei, Jung-jae Kim, Longyin Zhang +2 more

Large Language Models (LLMs) with extended reasoning capabilities often generate verbose and redundant reasoning traces, incurring unnecessary...

3 months ago cs.AI cs.CL PDF

Survey MEDIUM

MCP-38: A Comprehensive Threat Taxonomy for Model Context Protocol Systems (v1.0)

Yi Ting Shen, Kentaroh Toyoda, Alex Leung

The Model Context Protocol (MCP) introduces a structurally distinct attack surface that existing threat frameworks, designed for traditional software...

3 months ago cs.CR cs.AI PDF

Survey MEDIUM

Network and Device Level Cyber Deception for Contested Environments Using RL and LLMs

Abhijeet Sahu, Shuva Paul, Richard Macwan

Cyber deception assists in increasing the attacker's budget in reconnaissance or any early phases of threat intrusions. In the past, numerous methods...

3 months ago cs.CR cs.ET PDF

Attack HIGH

LAAF: Logic-layer Automated Attack Framework A Systematic Red-Teaming Methodology for LPCI Vulnerabilities in Agentic Large Language Model Systems

Hammad Atta, Ken Huang, Kyriakos Rock Lambros +11 more

Agentic LLM systems equipped with persistent memory, RAG pipelines, and external tool connectors face a class of attacks - Logic-layer Prompt Control...

3 months ago cs.CR PDF

Attack MEDIUM

Towards Unsupervised Adversarial Document Detection in Retrieval Augmented Generation Systems

Patrick Levi

Retrieval augmented generation systems have become an integral part of everyday life. Whether in internet search engines, email systems, or service...

3 months ago cs.CR cs.AI PDF

Attack HIGH

Detecting Data Poisoning in Code Generation LLMs via Black-Box, Vulnerability-Oriented Scanning

Shenao Yan, Shimaa Ahmed, Shan Jin +4 more

Code generation large language models (LLMs) are increasingly integrated into modern software development workflows. Recent work has shown that these...

3 months ago cs.CR cs.AI cs.SE PDF

Tool MEDIUM

Security Assessment and Mitigation Strategies for Large Language Models: A Comprehensive Defensive Framework

Taiwo Onitiju, Iman Vakilinia

Large Language Models increasingly power critical infrastructure from healthcare to finance, yet their vulnerability to adversarial manipulation...

3 months ago cs.CR cs.AI PDF

Attack MEDIUM

An End-to-End Framework for Functionality-Embedded Provenance Graph Construction and Threat Interpretation

Kushankur Ghosh, Mehar Klair, Kian Kyars +2 more

Provenance graphs model causal system-level interactions from logs, enabling anomaly detectors to learn normal behavior and detect deviations as...

3 months ago cs.CR cs.LG PDF

Benchmark LOW

MedCL-Bench: Benchmarking stability-efficiency trade-offs and scaling in biomedical continual learning

Min Zeng, Shuang Zhou, Zaifu Zhan +1 more

Medical language models must be updated as evidence and terminology evolve, yet sequential updating can trigger catastrophic forgetting. Although...

3 months ago cs.AI PDF

Benchmark MEDIUM

Differential Harm Propensity in Personalized LLM Agents: The Curious Case of Mental Health Disclosure

Caglar Yildirim

Large language models (LLMs) are increasingly deployed as tool-using agents, shifting safety concerns from harmful text generation to harmful task...

3 months ago cs.AI PDF

Attack HIGH

REFORGE: Multi-modal Attacks Reveal Vulnerable Concept Unlearning in Image Generation Models

Yong Zou, Haoran Li, Fanxiao Li +5 more

Recent progress in image generation models (IGMs) enables high-fidelity content creation but also amplifies risks, including the reproduction of...

3 months ago cs.CV cs.AI cs.CR PDF

Attack HIGH

Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation

Guangsheng Zhang, Huan Tian, Leo Zhang +4 more

Semantic segmentation models are widely deployed in safety-critical applications such as autonomous driving, yet their vulnerability to backdoor...

3 months ago cs.CR PDF

Attack HIGH

Rotated Robustness: A Training-Free Defense against Bit-Flip Attacks on Large Language Models

Deng Liu, Song Chen

Hardware faults, specifically bit-flips in quantized weights, pose a severe reliability threat to Large Language Models (LLMs), often triggering...

3 months ago cs.CR PDF

Benchmark MEDIUM

CoMAI: A Collaborative Multi-Agent Framework for Robust and Equitable Interview Evaluation

Gengxin Sun, Ruihao Yu, Liangyi Yin +3 more

Ensuring robust and fair interview assessment remains a key challenge in AI-driven evaluation. This paper presents CoMAI, a general-purpose...

3 months ago cs.MA cs.AI PDF

Attack HIGH

Structured Semantic Cloaking for Jailbreak Attacks on Large Language Models

Xiaobing Sun, Perry Lam, Shaohua Li +4 more

Modern LLMs employ safety mechanisms that extend beyond surface-level input filtering to latent semantic representations and generation-time...

3 months ago cs.CL PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,082+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial