AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 61–80 of 407 papers

Clear filters

Defense MEDIUM

Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training

Christian Moya, Alex Semendinger, Guang Lin +1 more

Preference learning methods such as Direct Preference Optimization (DPO) are known to induce reliance on spurious correlations, leading to sycophancy...

1 months ago cs.LG cs.AI PDF

Defense MEDIUM

FedSurrogate: Backdoor Defense in Federated Learning via Layer Criticality and Surrogate Replacement

Fatima Z. Abacha, Sin G. Teo, Yuanxiang Wu +2 more

Federated Learning remains highly susceptible to backdoor attacks--malicious clients inject targeted behaviours into the global model. Existing...

1 months ago cs.CR cs.LG PDF

Defense HIGH

Beyond Red-Teaming: Formal Guarantees of LLM Guardrail Classifiers

Nikita Kezins, Urbas Ekka, Pascal Berrang +1 more

Guardrail Classifiers defend production language models against harmful behavior, but although results seem promising in testing, they provide no...

1 months ago cs.LG PDF

Defense LOW

Conformity Generates Collective Misalignment in AI Agents Societies

Giordano De Marzo, Alessandro Bellina, Claudio Castellano +2 more

Artificial intelligence safety research focuses on aligning individual language models with human values, yet deployed AI systems increasingly...

1 months ago physics.soc-ph cs.CL cs.MA PDF

Defense MEDIUM

Intrinsic Guardrails: How Semantic Geometry of Personality Interacts with Emergent Misalignment in LLMs

Krishak Aneja, Manas Mittal, Anmol Goel +2 more

Fine-tuning Large Language Models (LLMs) on benign narrow data can sometimes induce broad harmful behaviors, a vulnerability termed emergent...

1 months ago cs.CL cs.AI PDF

Defense LOW

GuardAD: Safeguarding Autonomous Driving MLLMs via Markovian Safety Logic

Tianyuan Zhang, Peng Yue, Zihao Peng +8 more

Multimodal large language models (MLLMs) are increasingly integrated into autonomous driving (AD) systems; however, they remain vulnerable to diverse...

1 months ago cs.AI PDF

Defense HIGH

VulTriage: Triple-Path Context Augmentation for LLM-Based Vulnerability Detection

Wenxin Tang, Xiang Zhang, Junliang Liu +11 more

Automated vulnerability detection is a fundamental task in software security, yet existing learning-based methods still struggle to capture the...

1 months ago cs.AI PDF

Defense LOW

Automated alignment is harder than you think

Aleksandr Bowkis, Marie Davidsen Buhl, Jacob Pfau +1 more

A leading proposal for aligning artificial superintelligence (ASI) is to use AI agents to automate an increasing fraction of alignment research as...

1 months ago cs.AI PDF

Defense MEDIUM

ClawGuard: Out-of-Band Detection of LLM Agent Workflow Hijacking via EM Side Channel

Leo Linqian Gan, Jeffery Wu, Longyuan Ge +6 more

Autonomous LLM agents face a critical security risk known as workflow hijacking, where attackers subtly alter tool and skill invocations. Existing...

1 months ago cs.CR PDF

Defense MEDIUM

Safety Anchor: Defending Harmful Fine-tuning via Geometric Bottlenecks

Guoxin Lu, Letian Sha, Qing Wang +4 more

The safety alignment of Large Language Models (LLMs) remains vulnerable to Harmful Fine-tuning (HFT). While existing defenses impose constraints on...

1 months ago cs.CR cs.AI cs.CL PDF

Defense MEDIUM

Lightweight Stylistic Consistency Profiling: Robust Detection of LLM-Generated Textual Content for Multimedia Moderation

Siyuan Li, Aodu Wulianghai, Xi Lin +6 more

The increasing prevalence of Large Language Models (LLMs) in content creation has made distinguishing human-written textual content from...

1 months ago cs.CL PDF

Defense MEDIUM

One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue

Xinjie Shen, Rongzhe Wei, Peizhi Niu +6 more

Hidden malicious intent in multi-turn dialogue poses a growing threat to deployed large language models (LLMs). Rather than exposing a harmful...

1 months ago cs.CL cs.AI cs.CR PDF

Defense LOW

SLAM: Structural Linguistic Activation Marking for Language Models

Fabrice Harel-Canada, Amit Sahai

LLM watermarks must be detectable without compromising text quality, yet most existing schemes bias the next-token distribution and pay for detection...

1 months ago cs.CL cs.AI PDF

Defense MEDIUM

You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation

Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera +2 more

The open-source ecosystem has accelerated the democratization of Large Language Models (LLMs) through the public distribution of specialized Low-Rank...

1 months ago cs.CR PDF

Defense LOW

RangeGuard: Efficient, Bounded Approximate Error Correction for Reliable DNNs

Hanum Ko, Sangheum Yeon, Jong Hwan Ko +1 more

As DRAM scales in density and adopts 3D integration, raw fault rates increase and multi-bit errors are no longer rare. Such errors can severely...

1 months ago cs.AR PDF

Defense LOW

Experiment-as-Code Labs: A Declarative Stack for AI-Driven Scientific Discovery

Zhenning Yang, Yuhan Chen, Patrick Tser Jern Kon +5 more

To unleash the full potential of AI for Science, we must untether the agents from a purely digital environment. The agent's ability to control and...

1 months ago eess.SY cs.AI PDF

Defense LOW

Robust Agent Compensation (RAC): Teaching AI Agents to Compensate

Srinath Perera, Kaviru Hapuarachchi, Frank Leymann +1 more

We present Robust Agent Compensation (RAC), a log-based recovery paradigm (providing a safety net) implemented through an architectural extension...

1 months ago cs.AI PDF

Defense MEDIUM

Self-Mined Hardness for Safety Fine-Tuning

Prakhar Gupta, Garv Shah, Donghua Zhang

Safety fine-tuning of language models typically requires a curated adversarial dataset. We take a different approach: score each candidate prompt's...

1 months ago cs.LG cs.AI cs.CR PDF

Defense LOW

Dimensionality-Aware Anomaly Detection in Learned Representations of Self-Supervised Speech Models

Sandra Arcos-Holzinger, Sarah M. Erfani, James Bailey +1 more

Self-supervised speech models (S3Ms) achieve strong downstream performance, yet their learned representations remain poorly understood under natural...

1 months ago eess.AS cs.CR cs.LG PDF

Defense MEDIUM

RefusalGuard: Geometry-Preserving Fine-Tuning for Safety in LLMs

Sadia Asif, Mohammad Mohammadi Amiri

Fine-tuning safety-aligned language models for downstream tasks often leads to substantial degradation of refusal behavior, making models vulnerable...

1 months ago cs.LG cs.AI cs.CE PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial