AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 61–80 of 319 papers

Clear filters

Tool MEDIUM

LLM-CEG: Extending the Classification Error Gauge Framework for Privacy Auditing of Large Language Models

Kato Mivule

This paper extends the Classification Error Gauge (x-CEG) framework, originally developed for measuring the privacy-utility trade-off in tabular...

2 months ago cs.CR PDF

Tool HIGH

Prompt-Unknown Promotion Attacks against LLM-based Sequential Recommender Systems

Yuchuan Zhao, Tong Chen, Junliang Yu +3 more

Large language model-powered sequential recommender systems (LLM-SRSs) have recently demonstrated remarkable performance, enabling recommendations...

2 months ago cs.IR PDF

Tool HIGH

MCP Pitfall Lab: Exposing Developer Pitfalls in MCP Tool Server Security under Multi-Vector Attacks

Run Hao, Zhuoran Tan

Model Context Protocol (MCP) is increasingly adopted for tool-integrated LLM agents, but its multi-layer design and third-party server ecosystem...

2 months ago cs.CR PDF

Tool MEDIUM

AVISE: Framework for Evaluating the Security of AI Systems

Mikko Lempinen, Joni Kemppainen, Niklas Raesalmi

As artificial intelligence (AI) systems are increasingly deployed across critical domains, their security vulnerabilities pose growing risks of...

2 months ago cs.CR cs.AI cs.CL PDF

Tool LOW

MedSkillAudit: A Domain-Specific Audit Framework for Medical Research Agent Skills

Yingyong Hou, Xinyuan Lao, Huimei Wang +10 more

Background: Agent skills are increasingly deployed as modular, reusable capability units in AI agent systems. Medical research agent skills require...

2 months ago cs.AI PDF

Tool HIGH

If you're waiting for a sign... that might not be it! Mitigating Trust Boundary Confusion from Visual Injections on Vision-Language Agentic Systems

Jiamin Chang, Minhui Xue, Ruoxi Sun +3 more

Recent advances in embodied Vision-Language Agentic Systems (VLAS), powered by large vision-language models (LVLMs), enable AI systems to perceive...

2 months ago cs.CV cs.AI PDF

Tool HIGH

ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System

Jiacheng Liang, Yao Ma, Tharindu Kumarage +5 more

Reinforcement Learning from Human Feedback (RLHF) is central to aligning Large Language Models (LLMs), yet it introduces a critical vulnerability: an...

2 months ago cs.AI cs.CR cs.LG PDF

Tool MEDIUM

Reverse Constitutional AI: A Framework for Controllable Toxic Data Generation via Probability-Clamped RLAIF

Yuan Fang, Yiming Luo, Aimin Zhou +1 more

Ensuring the safety of large language models (LLMs) requires robust red teaming, yet the systematic synthesis of high-quality toxic data remains...

2 months ago cs.CL cs.AI PDF

Tool LOW

Don't Let AI Agents YOLO Your Files: Shifting Information and Control to Filesystems for Agent Safety and Autonomy

Shawn, Zhong, Junxuan Liao +4 more

AI coding agents operate directly on users' filesystems, where they regularly corrupt data, delete files, and leak secrets. Current approaches force...

2 months ago cs.OS PDF

Tool LOW

LogicEval: A Systematic Framework for Evaluating Automated Repair Techniques for Logical Vulnerabilities in Real-World Software

Syed Md Mukit Rashid, Abdullah Al Ishtiaq, Kai Tu +7 more

Logical vulnerabilities in software stem from flaws in program logic rather than memory safety, which can lead to critical security failures....

2 months ago cs.CR cs.AI PDF

Tool MEDIUM

TimeMark: A Trustworthy Time Watermarking Framework for Exact Generation-Time Recovery from AIGC

Shangkun Che, Silin Du, Ge Gao

The widespread use of Large Language Models (LLMs) in text generation has raised increasing concerns about intellectual property disputes....

2 months ago cs.CR cs.CL PDF

Tool HIGH

ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection

Wei Zhao, Zhe Li, Peixin Zhang +1 more

Tool-augmented Large Language Model (LLM) agents have demonstrated impressive capabilities in automating complex, multi-step real-world tasks, yet...

2 months ago cs.CR cs.AI PDF

Tool HIGH

The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems

Yihao Zhang, Kai Wang, Jiangrong Wu +7 more

Large Language Models (LLMs) face prominent security risks from jailbreaking, a practice that manipulates models to bypass built-in security...

2 months ago cs.CR cs.AI cs.CL PDF

Tool HIGH

Critical-CoT: A Robust Defense Framework against Reasoning-Level Backdoor Attacks in Large Language Models

Vu Tuan Truong, Long Bao Le

Large Language Models (LLMs), despite their impressive capabilities across domains, have been shown to be vulnerable to backdoor attacks. Prior...

2 months ago cs.CR cs.AI PDF

Tool MEDIUM

TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation

Hengkai Ye, Zhechang Zhang, Jinyuan Jia +1 more

Large language models (LLMs) increasingly rely on external tools to perform time-sensitive tasks and real-world actions. While tool integration...

2 months ago cs.CR PDF

Tool MEDIUM

TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories

Yen-Shan Chen, Sian-Yao Huang, Cheng-Lin Yang +1 more

As large language models (LLMs) evolve from static chatbots into autonomous agents, the primary vulnerability surface shifts from final outputs to...

2 months ago cs.CR cs.AI cs.CL PDF

Tool MEDIUM

SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills

Yinghan Hou, Zongyou Yang

OpenClaw's ClawHub marketplace hosts over 13,000 community-contributed agent skills, and between 13% and 26% of them contain security vulnerabilities...

2 months ago cs.CR cs.AI PDF

Tool MEDIUM

From Incomplete Architecture to Quantified Risk: Multimodal LLM-Driven Security Assessment for Cyber-Physical Systems

Shaofei Huang, Christopher M. Poskitt, Lwin Khin Shar

Cyber-physical systems often contend with incomplete architectural documentation or outdated information resulting from legacy technologies,...

2 months ago cs.CR cs.AI PDF

Tool MEDIUM

LanG -- A Governance-Aware Agentic AI Platform for Unified Security Operations

Anes Abdennebi, Nadjia Kara, Laaziz Lahlou +1 more

Modern Security Operations Centers struggle with alert fatigue, fragmented tooling, and limited cross-source event correlation. Challenges that...

2 months ago cs.CR cs.AI PDF

Tool MEDIUM

Your LLM Agent Can Leak Your Data: Data Exfiltration via Backdoored Tool Use

Wuyang Zhang, Shichao Pei

Tool-use large language model (LLM) agents are increasingly deployed to support sensitive workflows, relying on tool calls for retrieval, external...

2 months ago cs.CR cs.AI PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial