AI Security Research

2,583+ academic papers on AI security, attacks, and defenses

Total

2,583

Attack

994

Benchmark

740

Defense

355

Tool

275

Survey

146

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 461–480 of 994 papers

Clear filters

Attack MEDIUM

ShellForge: Adversarial Co-Evolution of Webshell Generation and Multi-View Detection for Robust Webshell Defense

Yizhong Ding

Webshells remain a primary foothold for attackers to compromise servers, particularly within PHP ecosystems. However, existing detection mechanisms...

3 months ago cs.CR cs.AI PDF

Attack HIGH

Membership Inference Attacks Against Fine-tuned Diffusion Language Models

Yuetian Chen, Kaiyuan Zhang, Yuntao Du +5 more

Diffusion Language Models (DLMs) represent a promising alternative to autoregressive language models, using bidirectional masked token prediction....

3 months ago cs.LG cs.AI PDF

Attack HIGH

What Hard Tokens Reveal: Exploiting Low-confidence Tokens for Membership Inference Attacks against Large Language Models

Md Tasnim Jawad, Mingyan Xiao, Yanzhao Wu

With the widespread adoption of Large Language Models (LLMs) and increasingly stringent privacy regulations, protecting data privacy in LLMs has...

3 months ago cs.CR PDF

Attack HIGH

LLM-VA: Resolving the Jailbreak-Overrefusal Trade-off via Vector Alignment

Haonan Zhang, Dongxia Wang, Yi Liu +2 more

Safety-aligned LLMs suffer from two failure modes: jailbreak (answering harmful inputs) and over-refusal (declining benign queries). Existing vector...

3 months ago cs.LG cs.AI PDF

Attack MEDIUM

LLMs Can Unlearn Refusal with Only 1,000 Benign Samples

Yangyang Guo, Ziwei Xu, Si Liu +2 more

This study reveals a previously unexplored vulnerability in the safety alignment of Large Language Models (LLMs). Existing aligned LLMs predominantly...

3 months ago cs.CR PDF

Attack MEDIUM

Contrastive Spectral Rectification: Test-Time Defense towards Zero-shot Adversarial Robustness of CLIP

Sen Nie, Jie Zhang, Zhuo Wang +2 more

Vision-language models (VLMs) such as CLIP have demonstrated remarkable zero-shot generalization, yet remain highly vulnerable to adversarial...

3 months ago cs.CV PDF

Attack HIGH

Thought-Transfer: Indirect Targeted Poisoning Attacks on Chain-of-Thought Reasoning Models

Harsh Chaudhari, Ethan Rathbun, Hanna Foerster +5 more

Chain-of-Thought (CoT) reasoning has emerged as a powerful technique for enhancing large language models' capabilities by generating intermediate...

3 months ago cs.CR cs.LG PDF

Attack HIGH

ARMOR: Agentic Reasoning for Methods Orchestration and Reparameterization for Robust Adversarial Attacks

Gabriel Lee Jun Rong, Christos Korgialas, Dion Jia Xu Ho +3 more

Existing automated attack suites operate as static ensembles with fixed sequences, lacking strategic adaptation and semantic awareness. This paper...

3 months ago cs.CV PDF

Attack HIGH

Comparison requires valid measurement: Rethinking attack success rate comparisons in AI red teaming

Alexandra Chouldechova, A. Feder Cooper, Solon Barocas +3 more

We argue that conclusions drawn about relative system safety or attack method efficacy via AI red teaming are often not supported by evidence...

3 months ago cs.LG PDF

Attack HIGH

Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis of Vulnerabilities in Skills, Tools, and Protocol Ecosystems

Narek Maloyan, Dmitry Namiot

The proliferation of agentic AI coding assistants, including Claude Code, GitHub Copilot, Cursor, and emerging skill-based architectures, has...

3 months ago cs.CR PDF

Attack HIGH

Physical Prompt Injection Attacks on Large Vision-Language Models

Chen Ling, Kai Hu, Hangcheng Liu +3 more

Large Vision-Language Models (LVLMs) are increasingly deployed in real-world intelligent systems for perception and reasoning in open physical...

3 months ago cs.CV cs.AI PDF

Attack HIGH

Res-MIA: A Training-Free Resolution-Based Membership Inference Attack on Federated Learning Models

Mohammad Zare, Pirooz Shamsinejadbabaki

Membership inference attacks (MIAs) pose a serious threat to the privacy of machine learning models by allowing adversaries to determine whether a...

3 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

Robust Privacy: Inference-Time Privacy through Certified Robustness

Jiankai Jin, Xiangzheng Zhang, Zhao Liu +2 more

Machine learning systems can produce personalized outputs that allow an adversary to infer sensitive input attributes at inference time. We introduce...

3 months ago cs.LG cs.AI cs.CR PDF

Attack HIGH

On the Insecurity of Keystroke-Based AI Authorship Detection: Timing-Forgery Attacks Against Motor-Signal Verification

David Condrey

Recent proposals advocate using keystroke timing signals, specifically the coefficient of variation ($δ$) of inter-keystroke intervals, to...

3 months ago cs.CR cs.AI cs.HC PDF

Attack MEDIUM

GRIP: Algorithm-Agnostic Machine Unlearning for Mixture-of-Experts via Geometric Router Constraints

Andy Zhu, Rongzhe Wei, Yupu Gu +1 more

Machine unlearning (MU) for large language models has become critical for AI safety, yet existing methods fail to generalize to Mixture-of-Experts...

3 months ago cs.LG cs.AI PDF

Attack HIGH

From Transactions to Exploits: Automated PoC Synthesis for Real-World DeFi Attacks

Xing Su, Hao Wu, Hanzhong Liang +4 more

Blockchain systems are increasingly targeted by on-chain attacks that exploit contract vulnerabilities to extract value rapidly and stealthily,...

3 months ago cs.CR cs.SE PDF

Attack HIGH

Persona Jailbreaking in Large Language Models

Jivnesh Sandhan, Fei Cheng, Tushar Sandhan +1 more

Large Language Models (LLMs) are increasingly deployed in domains such as education, mental health and customer support, where stable and consistent...

3 months ago cs.CL PDF

Attack MEDIUM

Feature-Space Adversarial Robustness Certification for Multimodal Large Language Models

Song Xia, Meiwen Ding, Chenqi Kong +2 more

Multimodal large language models (MLLMs) exhibit strong capabilities across diverse applications, yet remain vulnerable to adversarial perturbations...

3 months ago cs.LG cs.CV PDF

Attack HIGH

Attributing and Exploiting Safety Vectors through Global Optimization in Large Language Models

Fengheng Chu, Jiahao Chen, Yuhong Wang +4 more

While Large Language Models (LLMs) are aligned to mitigate risks, their safety guardrails remain fragile against jailbreak attacks. This reveals...

3 months ago cs.LG cs.CR PDF

Attack HIGH

Beyond Visual Safety: Jailbreaking Multimodal Large Language Models for Harmful Image Generation via Semantic-Agnostic Inputs

Mingyu Yu, Lana Liu, Zhehao Zhao +2 more

The rapid advancement of Multimodal Large Language Models (MLLMs) has introduced complex security challenges, particularly at the intersection of...

3 months ago cs.CV cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial