AI Security Research

2,560+ academic papers on AI security, attacks, and defenses

Total

2,560

Attack

982

Benchmark

736

Defense

350

Tool

275

Survey

144

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 441–460 of 726 papers

Clear filters

Attack MEDIUM

Stay in Character, Stay Safe: Dual-Cycle Adversarial Self-Evolution for Safety Role-Playing Agents

Mingyang Liao, Yichen Wan, shuchen wu +6 more

LLM-based role-playing has rapidly improved in fidelity, yet stronger adherence to persona constraints commonly increases vulnerability to jailbreak...

3 months ago cs.AI PDF

Attack HIGH

ICL-EVADER: Zero-Query Black-Box Evasion Attacks on In-Context Learning and Their Defenses

Ningyuan He, Ronghong Huang, Qianqian Tang +3 more

In-context learning (ICL) has become a powerful, data-efficient paradigm for text classification using large language models. However, its robustness...

3 months ago cs.CR PDF

Attack MEDIUM

RerouteGuard: Understanding and Mitigating Adversarial Risks for LLM Routing

Wenhui Zhang, Huiyu Xu, Zhibo Wang +4 more

Recent advancements in multi-model AI systems have leveraged LLM routers to reduce computational cost while maintaining response quality by assigning...

3 months ago cs.CR PDF

Attack MEDIUM

LAMP: Learning Universal Adversarial Perturbations for Multi-Image Tasks via Pre-trained Models

Alvi Md Ishmam, Najibul Haque Sarker, Zaber Ibn Abdul Hakim +1 more

Multimodal Large Language Models (MLLMs) have achieved remarkable performance across vision-language tasks. Recent advancements allow these models to...

3 months ago cs.CV PDF

Attack MEDIUM

Adaptive and Robust Cost-Aware Proof of Quality for Decentralized LLM Inference Networks

Arther Tian, Alex Ding, Frank Chen +2 more

Decentralized large language model inference networks require lightweight mechanisms to reward high quality outputs under heterogeneous latency and...

3 months ago cs.CR cs.AI PDF

Attack MEDIUM

OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence

Jarrod Barnes

As large language models (LLMs) improve, so do their offensive applications: frontier agents now generate working exploits for under $50 in compute...

3 months ago cs.AI PDF

Attack MEDIUM

Diversifying Toxicity Search in Large Language Models Through Speciation

Onkar Shelar, Travis Desell

Evolutionary prompt search is a practical black-box approach for red teaming large language models (LLMs), but existing methods often collapse onto a...

3 months ago cs.NE q-bio.PE PDF

Attack HIGH

ICON: Intent-Context Coupling for Efficient Multi-Turn Jailbreak Attack

Xingwei Lin, Wenhao Lin, Sicong Cao +4 more

Multi-turn jailbreak attacks have emerged as a critical threat to Large Language Models (LLMs), bypassing safety mechanisms by progressively...

3 months ago cs.CR cs.AI PDF

Attack MEDIUM

ShellForge: Adversarial Co-Evolution of Webshell Generation and Multi-View Detection for Robust Webshell Defense

Yizhong Ding

Webshells remain a primary foothold for attackers to compromise servers, particularly within PHP ecosystems. However, existing detection mechanisms...

3 months ago cs.CR cs.AI PDF

Attack HIGH

Membership Inference Attacks Against Fine-tuned Diffusion Language Models

Yuetian Chen, Kaiyuan Zhang, Yuntao Du +5 more

Diffusion Language Models (DLMs) represent a promising alternative to autoregressive language models, using bidirectional masked token prediction....

3 months ago cs.LG cs.AI PDF

Attack HIGH

What Hard Tokens Reveal: Exploiting Low-confidence Tokens for Membership Inference Attacks against Large Language Models

Md Tasnim Jawad, Mingyan Xiao, Yanzhao Wu

With the widespread adoption of Large Language Models (LLMs) and increasingly stringent privacy regulations, protecting data privacy in LLMs has...

3 months ago cs.CR PDF

Attack HIGH

LLM-VA: Resolving the Jailbreak-Overrefusal Trade-off via Vector Alignment

Haonan Zhang, Dongxia Wang, Yi Liu +2 more

Safety-aligned LLMs suffer from two failure modes: jailbreak (answering harmful inputs) and over-refusal (declining benign queries). Existing vector...

3 months ago cs.LG cs.AI PDF

Attack MEDIUM

LLMs Can Unlearn Refusal with Only 1,000 Benign Samples

Yangyang Guo, Ziwei Xu, Si Liu +2 more

This study reveals a previously unexplored vulnerability in the safety alignment of Large Language Models (LLMs). Existing aligned LLMs predominantly...

3 months ago cs.CR PDF

Attack MEDIUM

Contrastive Spectral Rectification: Test-Time Defense towards Zero-shot Adversarial Robustness of CLIP

Sen Nie, Jie Zhang, Zhuo Wang +2 more

Vision-language models (VLMs) such as CLIP have demonstrated remarkable zero-shot generalization, yet remain highly vulnerable to adversarial...

3 months ago cs.CV PDF

Attack HIGH

Thought-Transfer: Indirect Targeted Poisoning Attacks on Chain-of-Thought Reasoning Models

Harsh Chaudhari, Ethan Rathbun, Hanna Foerster +5 more

Chain-of-Thought (CoT) reasoning has emerged as a powerful technique for enhancing large language models' capabilities by generating intermediate...

3 months ago cs.CR cs.LG PDF

Attack HIGH

ARMOR: Agentic Reasoning for Methods Orchestration and Reparameterization for Robust Adversarial Attacks

Gabriel Lee Jun Rong, Christos Korgialas, Dion Jia Xu Ho +3 more

Existing automated attack suites operate as static ensembles with fixed sequences, lacking strategic adaptation and semantic awareness. This paper...

3 months ago cs.CV PDF

Attack HIGH

Comparison requires valid measurement: Rethinking attack success rate comparisons in AI red teaming

Alexandra Chouldechova, A. Feder Cooper, Solon Barocas +3 more

We argue that conclusions drawn about relative system safety or attack method efficacy via AI red teaming are often not supported by evidence...

3 months ago cs.LG PDF

Attack HIGH

Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis of Vulnerabilities in Skills, Tools, and Protocol Ecosystems

Narek Maloyan, Dmitry Namiot

The proliferation of agentic AI coding assistants, including Claude Code, GitHub Copilot, Cursor, and emerging skill-based architectures, has...

3 months ago cs.CR PDF

Attack HIGH

Physical Prompt Injection Attacks on Large Vision-Language Models

Chen Ling, Kai Hu, Hangcheng Liu +3 more

Large Vision-Language Models (LVLMs) are increasingly deployed in real-world intelligent systems for perception and reasoning in open physical...

3 months ago cs.CV cs.AI PDF

Attack HIGH

Res-MIA: A Training-Free Resolution-Based Membership Inference Attack on Federated Learning Models

Mohammad Zare, Pirooz Shamsinejadbabaki

Membership inference attacks (MIAs) pose a serious threat to the privacy of machine learning models by allowing adversaries to determine whether a...

3 months ago cs.CR cs.AI cs.LG PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial