AI Security Research

AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.

Adversarial attacks
Model defenses
Red-teaming benchmarks
Surveys
Security tooling

Total

3,023
Attack

1,175
Benchmark

866
Defense

407
Tool

319
Survey

176

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 301–320 of 407 papers

Clear filters

Defense MEDIUM

Matching Ranks Over Probability Yields Truly Deep Safety Alignment

Jason Vega, Gagandeep Singh

A frustratingly easy technique known as the prefilling attack has been shown to effectively circumvent the safety alignment of frontier LLMs by...

6 months ago cs.CR cs.AI PDF

Defense MEDIUM

The Forgotten Shield: Safety Grafting in Parameter-Space for Medical MLLMs

Jiale Zhao, Xing Mou, Jinlin Wu +7 more

Medical Multimodal Large Language Models (Medical MLLMs) have achieved remarkable progress in specialized medical tasks; however, research into their...

6 months ago cs.LG cs.AI cs.CL PDF

Defense MEDIUM

One Detector Fits All: Robust and Adaptive Detection of Malicious Packages from PyPI to Enterprises

Biagio Montaruli, Luca Compagna, Serena Elisa Ponta +1 more

The rise of supply chain attacks via malicious Python packages demands robust detection solutions. Current approaches, however, overlook two critical...

6 months ago cs.CR cs.LG PDF

Defense LOW

VLM as Strategist: Adaptive Generation of Safety-critical Testing Scenarios via Guided Diffusion

Xinzheng Wu, Junyi Chen, Naiting Zhong +1 more

The safe deployment of autonomous driving systems (ADSs) relies on comprehensive testing and evaluation. However, safety-critical scenarios that can...

6 months ago cs.RO cs.LG PDF

Defense LOW

SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment

Yixuan Tang, Yi Yang

Aligning Large Language Models (LLMs) with human preferences typically relies on external supervision, which faces critical limitations: human...

6 months ago cs.CL PDF

Defense MEDIUM

Real Time Detection and Quantitative Analysis of Spurious Forgetting in Continual Learning

Weiwei Wang

Catastrophic forgetting remains a fundamental challenge in continual learning for large language models. Recent work revealed that performance...

6 months ago cs.LG cs.AI cs.CL PDF

Defense MEDIUM

The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search

Rongzhe Wei, Peizhi Niu, Xinjie Shen +7 more

Large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs. Existing approaches...

6 months ago cs.CR PDF

Defense LOW

Minimal neuron ablation triggers catastrophic collapse in the language core of Large Vision-Language Models

Cen Lu, Yung-Chen Tang, Andrea Cavallaro

Large Vision-Language Models (LVLMs) have shown impressive multimodal understanding capabilities, yet their robustness is poorly understood. In this...

6 months ago cs.AI PDF

Defense MEDIUM

SD-CGAN: Conditional Sinkhorn Divergence GAN for DDoS Anomaly Detection in IoT Networks

Henry Onyeka, Emmanuel Samson, Liang Hong +3 more

The increasing complexity of IoT edge networks presents significant challenges for anomaly detection, particularly in identifying sophisticated...

7 months ago cs.LG cs.CR PDF

Defense MEDIUM

Are LLMs Good Safety Agents or a Propaganda Engine?

Neemesh Yadav, Francesco Ortu, Jiarui Liu +5 more

Large Language Models (LLMs) are trained to refuse to respond to harmful content. However, systematic analyses of whether this behavior is truly a...

7 months ago cs.CL PDF

Defense HIGH

Retrieval-Augmented Few-Shot Prompting Versus Fine-Tuning for Code Vulnerability Detection

Fouad Trad, Ali Chehab

Few-shot prompting has emerged as a practical alternative to fine-tuning for leveraging the capabilities of large language models (LLMs) in...

7 months ago cs.SE cs.AI cs.CL PDF

Defense LOW

Semantic Superiority vs. Forensic Efficiency: A Comparative Analysis of Deep Learning and Psycholinguistics for Business Email Compromise Detection

Yaw Osei Adjei, Frederick Ayivor, Davis Opoku

Business Email Compromise (BEC) is a sophisticated social engineering threat that manipulates organizational hierarchies, leading to significant...

7 months ago cs.LG cs.CR PDF

Defense LOW

Normative active inference: A numerical proof of principle for a computational and economic legal analytic approach to AI governance

Axel Constant, Mahault Albarracin, Karl J. Friston

This paper presents a computational account of how legal norms can influence the behavior of artificial intelligence (AI) agents, grounded in the...

7 months ago cs.CY PDF

Defense MEDIUM

Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation

Junbo Zhang, Ran Chen, Qianli Zhou +2 more

Large language models demonstrate powerful capabilities across various natural language processing tasks, yet they also harbor safety...

7 months ago cs.CR cs.CL PDF

Defense MEDIUM

EAGER: Edge-Aligned LLM Defense for Robust, Efficient, and Accurate Cybersecurity Question Answering

Onat Gungor, Roshan Sood, Jiasheng Zhou +1 more

Large Language Models (LLMs) are highly effective for cybersecurity question answering (QA) but are difficult to deploy on edge devices due to their...

7 months ago cs.CR PDF

Defense MEDIUM

Beyond Binary Classification: A Semi-supervised Approach to Generalized AI-generated Image Detection

Hong-Hanh Nguyen-Le, Van-Tuan Tran, Dinh-Thuc Nguyen +1 more

The rapid advancement of generators (e.g., StyleGAN, Midjourney, DALL-E) has produced highly realistic synthetic images, posing significant...

7 months ago cs.LG cs.AI cs.CR PDF

Defense MEDIUM

SafeCiM: Investigating Resilience of Hybrid Floating-Point Compute-in-Memory Deep Learning Accelerators

Swastik Bhattacharya, Sanjay Das, Anand Menon +3 more

Deep Neural Networks (DNNs) continue to grow in complexity with Large Language Models (LLMs) incorporating vast numbers of parameters. Handling these...

7 months ago cs.AR cs.LG PDF

Defense MEDIUM

Entropy-Based Measurement of Value Drift and Alignment Work in Large Language Models

Samih Fadli

Large language model safety is usually assessed with static benchmarks, but key failures are dynamic: value drift under distribution shift, jailbreak...

7 months ago cs.CL cs.AI cs.LG PDF

Defense MEDIUM

When Harmless Words Harm: A New Threat to LLM Safety via Conceptual Triggers

Zhaoxin Zhang, Borui Chen, Yiming Hu +3 more

Recent research on large language model (LLM) jailbreaks has primarily focused on techniques that bypass safety mechanisms to elicit overtly harmful...

7 months ago cs.CL PDF

Defense MEDIUM

N-GLARE: An Non-Generative Latent Representation-Efficient LLM Safety Evaluator

Zheyu Lin, Jirui Yang, Yukui Qiu +3 more

Evaluating the safety robustness of LLMs is critical for their deployment. However, mainstream Red Teaming methods rely on online generation and...

7 months ago cs.LG cs.CR PDF

Frequently asked questions

What is AI security research?

AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.

How many AI security papers does AI Threat Alert track?

AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.

Where do the research papers come from?

Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.

What topics does the AI security research cover?

Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.

How is this different from a generic paper search?

Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial