AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 61–80 of 264 papers

Clear filters

Defense MEDIUM

Committed SAE-Feature Traces for Audited-Session Substitution Detection in Hosted LLMs

Ziyang Liu

Hosted-LLM providers have a silent-substitution incentive: advertise a stronger model while serving cheaper replies. Probe-after-return schemes such...

2 months ago cs.CR cs.AI PDF

Defense MEDIUM

Owner-Harm: A Missing Threat Model for AI Agent Safety

Dongcheng Zhang, Yiqing Jiang

Existing AI agent safety benchmarks focus on generic criminal harm (cybercrime, harassment, weapon synthesis), leaving a systematic blind spot for a...

2 months ago cs.CR cs.AI cs.CL PDF

Defense MEDIUM

TitanCA: Lessons from Orchestrating LLM Agents to Discover 100+ CVEs

Ting Zhang, Yikun Li, Chengran Yang +15 more

Software vulnerabilities remain one of the most persistent threats to modern digital infrastructure. While static application security testing (SAST)...

2 months ago cs.CR PDF

Defense MEDIUM

SafeAgent: A Runtime Protection Architecture for Agentic Systems

Hailin Liu, Eugene Ilyushin, Jie Ni +1 more

Large language model (LLM) agents are vulnerable to prompt-injection attacks that propagate through multi-step workflows, tool interactions, and...

2 months ago cs.AI cs.MA PDF

Defense MEDIUM

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

Xiaohua Wang, Muzhao Tian, Yuqi Zeng +20 more

Reinforcement Learning from Human Feedback (RLHF) and related alignment paradigms have become central to steering large language models (LLMs) and...

2 months ago cs.LG PDF

Defense MEDIUM

Can Agents Secure Hardware? Evaluating Agentic LLM-Driven Obfuscation for IP Protection

Sujan Ghimire, Parsa Mirfasihi, Muhtasim Alam Chowdhury +6 more

The globalization of integrated circuit (IC) design and manufacturing has increased the exposure of hardware intellectual property (IP) to untrusted...

2 months ago cs.CR PDF

Defense MEDIUM

Towards Platonic Representation for Table Reasoning: A Foundation for Permutation-Invariant Retrieval

Willy Carlos Tchuitcheu, Tan Lu, Ann Dooms

Historical approaches to Table Representation Learning (TRL) have largely adopted the sequential paradigms of Natural Language Processing (NLP). We...

2 months ago cs.AI PDF

Defense MEDIUM

Detecting Safety Violations Across Many Agent Traces

Adam Stein, Davis Brown, Hamed Hassani +2 more

To identify safety violations, auditors often search over large sets of agent traces. This search is difficult because failures are often rare,...

2 months ago cs.AI cs.CL PDF

Defense MEDIUM

LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety

Junxiao Yang, Haoran Liu, Jinzhe Tu +9 more

Large language models (LLMs) often demonstrate strong safety performance in high-resource languages, yet exhibit severe vulnerabilities when queried...

2 months ago cs.LG cs.AI cs.CL PDF

Defense MEDIUM

The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents

Xuwei Ding, Skylar Zhai, Linxin Song +6 more

Computer-use agents (CUAs) can now autonomously complete complex tasks in real digital environments, but when misled, they can also be used to...

2 months ago cs.CR cs.AI PDF

Defense MEDIUM

Towards Identification and Intervention of Safety-Critical Parameters in Large Language Models

Weiwei Qi, Zefeng Wu, Tianhang Zheng +4 more

Ensuring Large Language Model (LLM) safety is crucial, yet the lack of a clear understanding about safety mechanisms hinders the development of...

2 months ago cs.CR PDF

Defense MEDIUM

The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training

Rui Zhang, Hongwei Li, Yun Shen +6 more

The deployment of large language models (LLMs) raises significant ethical and safety concerns. While LLM alignment techniques are adopted to improve...

2 months ago cs.CR cs.CL PDF

Defense MEDIUM

SentinelSphere: Integrating AI-Powered Real-Time Threat Detection with Cybersecurity Awareness Training

Nikolaos D. Tantaroudas, Ilias Karachalios, Andrew J. McCracken

The field of cybersecurity is confronted with two interrelated challenges: a worldwide deficit of qualified practitioners and ongoing human-factor...

2 months ago cs.CE cs.AI cs.CR PDF

Defense MEDIUM

VLMShield: Efficient and Robust Defense of Vision-Language Models against Malicious Prompts

Peigui Qi, Kunsheng Tang, Yanpu Yu +7 more

Vision-Language Models (VLMs) face significant safety vulnerabilities from malicious prompt attacks due to weakened alignment during visual...

2 months ago cs.LG PDF

Defense MEDIUM

Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization

Igor Maljkovic, Maria Rosaria Briglia, Iacopo Masi +2 more

Vision-Language Models (VLMs) have become essential for tasks such as image synthesis, captioning, and retrieval by aligning textual and visual...

2 months ago cs.CR cs.AI cs.CV PDF

Defense MEDIUM

MA-IDS: Multi-Agent RAG Framework for IoT Network Intrusion Detection with an Experience Library

Md Shamimul Islam, Luis G. Jaimes, Ayesha S. Dina

Network Intrusion Detection Systems (NIDS) face important limitations. Signature-based methods are effective for known attack patterns, but they...

2 months ago cs.CR cs.AI PDF

Defense MEDIUM

Gradient-Controlled Decoding: A Safety Guardrail for LLMs with Dual-Anchor Steering

Purva Chiniya, Kevin Scaria, Sagar Chaturvedi

Large language models (LLMs) remain susceptible to jailbreak and direct prompt-injection attacks, yet the strongest defensive filters frequently...

2 months ago cs.CL PDF

Defense MEDIUM

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

Zijun Wang, Haoqin Tu, Letian Zhang +11 more

OpenClaw, the most widely deployed personal AI agent in early 2026, operates with full local system access and integrates with sensitive services...

2 months ago cs.CR cs.AI cs.CL PDF

Defense MEDIUM

Assertain: Automated Security Assertion Generation Using Large Language Models

Shams Tarek, Dipayan Saha, Khan Thamid Hasan +3 more

The increasing complexity of modern system-on-chip designs amplifies hardware security risks and makes manual security property specification a major...

2 months ago cs.CR PDF

Defense MEDIUM

ClawSafety: "Safe" LLMs, Unsafe Agents

Bowen Wei, Yunbei Zhang, Jinhao Pan +5 more

Personal AI agents like OpenClaw run with elevated privileges on users' local machines, where a single successful prompt injection can leak...

2 months ago cs.AI PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial