AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 361–380 of 1,455 papers

Clear filters

Benchmark MEDIUM

AgenTEE: Confidential LLM Agent Execution on Edge Devices

Sina Abdollahi, Mohammad M Maheri, Javad Forough +5 more

Large Language Model (LLM) agents provide powerful automation capabilities, but they also create a substantially broader attack surface than...

2 months ago cs.CR cs.OS PDF

Defense MEDIUM

Committed SAE-Feature Traces for Audited-Session Substitution Detection in Hosted LLMs

Ziyang Liu

Hosted-LLM providers have a silent-substitution incentive: advertise a stronger model while serving cheaper replies. Probe-after-return schemes such...

2 months ago cs.CR cs.AI PDF

Defense MEDIUM

Owner-Harm: A Missing Threat Model for AI Agent Safety

Dongcheng Zhang, Yiqing Jiang

Existing AI agent safety benchmarks focus on generic criminal harm (cybercrime, harassment, weapon synthesis), leaving a systematic blind spot for a...

2 months ago cs.CR cs.AI cs.CL PDF

Benchmark MEDIUM

Spatiotemporal Sycophancy: Negation-Based Gaslighting in Video Large Language Models

Ziyao Tang, Pengkun Jiao, Bin Zhu +3 more

Video Large Language Models (Vid-LLMs) have demonstrated remarkable performance in video understanding tasks, yet their robustness under...

2 months ago cs.CV PDF

Defense MEDIUM

TitanCA: Lessons from Orchestrating LLM Agents to Discover 100+ CVEs

Ting Zhang, Yikun Li, Chengran Yang +15 more

Software vulnerabilities remain one of the most persistent threats to modern digital infrastructure. While static application security testing (SAST)...

2 months ago cs.CR PDF

Benchmark MEDIUM

Privacy-Preserving Product-Quantized Approximate Nearest Neighbor Search Framework for Large-scale Datasets via A Hybrid of Fully Homomorphic Encryption and Trusted Execution Environment

Shozo Saeki, Minoru Kawahara, Hirohisa Aman

A nearest-neighbor framework is a fundamental tool for various applications involving Large Language Models (LLMs) and Visual Language Models (VLMs)....

2 months ago cs.CR PDF

Tool MEDIUM

Reverse Constitutional AI: A Framework for Controllable Toxic Data Generation via Probability-Clamped RLAIF

Yuan Fang, Yiming Luo, Aimin Zhou +1 more

Ensuring the safety of large language models (LLMs) requires robust red teaming, yet the systematic synthesis of high-quality toxic data remains...

2 months ago cs.CL cs.AI PDF

Benchmark MEDIUM

SDLLMFuzz: Dynamic-static LLM-assisted greybox fuzzing for structured input programs

Yihao Zou, Tianming Zheng, Futai Zou +1 more

Fuzzing has become a widely adopted technique for vulnerability discovery, yet it remains ineffective for structured-input programs due to strict...

2 months ago cs.CR cs.PL PDF

Defense MEDIUM

SafeAgent: A Runtime Protection Architecture for Agentic Systems

Hailin Liu, Eugene Ilyushin, Jie Ni +1 more

Large language model (LLM) agents are vulnerable to prompt-injection attacks that propagate through multi-step workflows, tool interactions, and...

2 months ago cs.AI cs.MA PDF

Attack MEDIUM

Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading

Jianming Tong, Hanshen Xiao, Krishna Kumar Nair +5 more

Multi-user virtual reality enables immersive interaction. However, rendering avatars for numerous participants on each headset incurs prohibitive...

2 months ago cs.CR cs.AR cs.CV PDF

Benchmark MEDIUM

Still Between Us? Evaluating and Improving Voice Assistant Robustness to Third-Party Interruptions

Dongwook Lee, Eunwoo Song, Che Hyun Lee +2 more

While recent Spoken Language Models (SLMs) have been actively deployed in real-world scenarios, they lack the capability to discern Third-Party...

2 months ago cs.CL cs.AI cs.SD PDF

Benchmark MEDIUM

GuardPhish: Securing Open-Source LLMs from Phishing Abuse

Rina Mishra, Gaurav Varshney, Doddipatla Sesha Sahithi

The rapid adoption of open-source Large Language Models (LLMs) in offline and enterprise environments has introduced a largely unexamined security...

2 months ago cs.CR PDF

Other MEDIUM

Feedback-Driven Execution for LLM-Based Binary Analysis

XiangRui Zhang, Qiang Li, Haining Wang

Binary analysis increasingly relies on large language models (LLMs) to perform semantic reasoning over complex program behaviors. However, existing...

2 months ago cs.CR PDF

Attack MEDIUM

Segment-Level Coherence for Robust Harmful Intent Probing in LLMs

Xuanli He, Bilgehan Sel, Faizan Ali +3 more

Large Language Models (LLMs) are increasingly exposed to adaptive jailbreaking, particularly in high-stakes Chemical, Biological, Radiological, and...

2 months ago cs.CL cs.CR PDF

Attack MEDIUM

NeuroTrace: Inference Provenance-Based Detection of Adversarial Examples

Firas Ben Hmida, Philemon Hailemariam, Kashif Ali Khan +1 more

Deep neural networks (DNNs) remain largely opaque at inference time, limiting our ability to detect and diagnose malicious input manipulations such...

2 months ago cs.CR PDF

Attack MEDIUM

From Where Words Come: Efficient Regularization of Code Tokenizers Through Source Attribution

Pavel Chizhov, Egor Bogomolov, Ivan P. Yamshchikov

Efficiency and safety of Large Language Models (LLMs), among other factors, rely on the quality of tokenization. A good tokenizer not only improves...

2 months ago cs.CL PDF

Benchmark MEDIUM

Learned or Memorized ? Quantifying Memorization Advantage in Code LLMs

Djiré Albérick Euraste, Kaboré Abdoul Kader, Jordan Samhi +3 more

The lack of transparency about code datasets used to train large language models (LLMs) makes it difficult to detect, evaluate, and mitigate data...

2 months ago cs.SE PDF

Survey MEDIUM

MCPThreatHive: Automated Threat Intelligence for Model Context Protocol Ecosystems

Yi Ting Shen, Kentaroh Toyoda, Alex Leung

The rapid proliferation of Model Context Protocol (MCP)-based agentic systems has introduced a new category of security threats that existing...

2 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment

Xixun Lin, Yang Liu, Yancheng Chen +9 more

The performance of large language model (LLM) agents depends critically on the execution harness, the system layer that orchestrates tool use,...

2 months ago cs.CR cs.AI PDF

Defense MEDIUM

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

Xiaohua Wang, Muzhao Tian, Yuqi Zeng +20 more

Reinforcement Learning from Human Feedback (RLHF) and related alignment paradigms have become central to steering large language models (LLMs) and...

2 months ago cs.LG PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial