AI Security Research

AI Threat Alert indexes 3,082+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,082
Attack

1,196
Benchmark

883
Defense

421
Tool

321
Survey

181

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 921–940 of 3,082 papers

Benchmark MEDIUM

Bounded by Risk, Not Capability: Quantifying AI Occupational Substitution Rates via a Tech-Risk Dual-Factor Model

Shuyao Gao, Minghao Huang

The deployment of Large Language Models (LLMs) has ignited concerns about technological unemployment. Existing task-based evaluations predominantly...

3 months ago cs.CY econ.GN PDF

Tool HIGH

ShieldNet: Network-Level Guardrails against Emerging Supply-Chain Injections in Agentic Systems

Zhuowen Yuan, Zhaorun Chen, Zhen Xiang +5 more

Existing research on LLM agent security mainly focuses on prompt injection and unsafe input/output behaviors. However, as agents increasingly rely on...

3 months ago cs.AI PDF

Other LOW

Poisoned Identifiers Survive LLM Deobfuscation: A Case Study on Claude Opus 4.6

Luis Guzmán Lorenzo

When an LLM deobfuscates JavaScript, can poisoned identifier names in the string table survive into the model's reconstructed code, even when the...

3 months ago cs.CR cs.AI cs.SE PDF

Tool MEDIUM

LLM-Enabled Open-Source Systems in the Wild: An Empirical Study of Vulnerabilities in GitHub Security Advisories

Fariha Tanjim Shifat, Hariswar Baburaj, Ce Zhou +2 more

Large language models (LLMs) are increasingly embedded in open-source software (OSS) ecosystems, creating complex interactions among natural language...

3 months ago cs.CR cs.SE PDF

Attack MEDIUM

Semantics Over Syntax: Uncovering Pre-Authentication 5G Baseband Vulnerabilities

Qiqing Huang, Xingyu Wang, Wanda Guo +2 more

Modern 5G user equipment (UE) processes Radio Resource Control (RRC) configuration messages during early control-plane exchanges, before...

3 months ago cs.CR PDF

Attack MEDIUM

Towards Unveiling Vulnerabilities of Large Reasoning Models in Machine Unlearning

Aobo Chen, Chenxu Zhao, Chenglin Miao +1 more

Large language models (LLMs) possess strong semantic understanding, driving significant progress in data mining applications. This is further...

3 months ago cs.LG cs.CR PDF

Attack HIGH

CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks

Siyuan Li, Zehao Liu, Xi Lin +6 more

As Large Language Models (LLMs) are increasingly deployed in complex applications, their vulnerability to adversarial attacks raises urgent safety...

3 months ago cs.CR cs.AI PDF

Benchmark LOW

Answering the Wrong Question: Reasoning Trace Inversion for Abstention in LLMs

Abinitha Gourabathina, Inkit Padhi, Manish Nagireddy +2 more

For Large Language Models (LLMs) to be reliably deployed, models must effectively know when not to answer: abstain. Reasoning models, in particular,...

3 months ago cs.AI PDF

Benchmark LOW

Neuro-RIT: Neuron-Guided Instruction Tuning for Robust Retrieval-Augmented Language Model

Jaemin Kim, Jae O Lee, Sumyeong Ahn +1 more

Retrieval-Augmented Language Models (RALMs) have demonstrated significant potential in knowledge-intensive tasks; however, they remain vulnerable to...

3 months ago cs.CL cs.AI PDF

Benchmark MEDIUM

Quantifying Self-Preservation Bias in Large Language Models

Matteo Migliarini, Joaquin Pereira Pizzini, Luca Moresca +3 more

Instrumental convergence predicts that sufficiently advanced AI agents will resist shutdown, yet current safety training (RLHF) may obscure this risk...

3 months ago cs.AI PDF

Attack MEDIUM

AEGIS: Adversarial Entropy-Guided Immune System -- Thermodynamic State Space Models for Zero-Day Network Evasion Detection

Vickson Ferrel

As TLS 1.3 encryption limits traditional Deep Packet Inspection (DPI), the security community has pivoted to Euclidean Transformer-based classifiers...

3 months ago cs.CR cs.LG PDF

Tool MEDIUM

MTI: A Behavior-Based Temperament Profiling System for AI Agents

Jihoon Jeong

AI models of equivalent capability can exhibit fundamentally different behavioral patterns, yet no standardized instrument exists to measure these...

3 months ago cs.AI cs.CL PDF

Defense HIGH

RuleForge: Automated Generation and Validation for Web Vulnerability Detection at Scale

Ayush Garg, Sophia Hager, Jacob Montiel +5 more

Security teams face a challenge: the volume of newly disclosed Common Vulnerabilities and Exposures (CVEs) far exceeds the capacity to manually...

3 months ago cs.CR cs.AI cs.CL PDF

Benchmark LOW

Teaching Students to Question the Machine: An AI Literacy Intervention Improves Students' Regulation of LLM Use in a Science Task

O. Clerc, R. Abdelghani, C. Desvaux +3 more

The rapid adoption of generative artificial intelligence (GenAI) in schools raises concerns about students' uncritical reliance on its outputs....

3 months ago cs.CY PDF

Benchmark MEDIUM

From Component Manipulation to System Compromise: Understanding and Detecting Malicious MCP Servers

Yiheng Huang, Zhijia Zhao, Bihuan Chen +5 more

The model context protocol (MCP) standardizes how LLMs connect to external tools and data sources, enabling faster integration but introducing new...

3 months ago cs.CR cs.SE PDF

Benchmark LOW

AURA: Multimodal Shared Autonomy for Real-World Urban Navigation

Yukai Ma, Honglin He, Selina Song +2 more

Long-horizon navigation in complex urban environments relies heavily on continuous human operation, which leads to fatigue, reduced efficiency, and...

3 months ago cs.RO PDF

Defense MEDIUM

Assertain: Automated Security Assertion Generation Using Large Language Models

Shams Tarek, Dipayan Saha, Khan Thamid Hasan +3 more

The increasing complexity of modern system-on-chip designs amplifies hardware security risks and makes manual security property specification a major...

3 months ago cs.CR PDF

Attack HIGH

SelfGrader: Stable Jailbreak Detection for Large Language Models using Token-Level Logits

Zikai Zhang, Rui Hu, Olivera Kotevska +1 more

Large Language Models (LLMs) are powerful tools for answering user queries, yet they remain highly vulnerable to jailbreak attacks. Existing...

3 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Cooking Up Risks: Benchmarking and Reducing Food Safety Risks in Large Language Models

Weidi Luo, Xiaofei Wen, Tenghao Huang +5 more

Large language models (LLMs) are increasingly deployed for everyday tasks, including food preparation and health-related guidance. However, food...

3 months ago cs.CR PDF

Defense MEDIUM

ClawSafety: "Safe" LLMs, Unsafe Agents

Bowen Wei, Yunbei Zhang, Jinhao Pan +5 more

Personal AI agents like OpenClaw run with elevated privileges on users' local machines, where a single successful prompt injection can leak...

3 months ago cs.AI PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,082+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial