AI Security Research

2,560+ academic papers on AI security, attacks, and defenses

Total

2,560

Attack

982

Benchmark

736

Defense

350

Tool

275

Survey

144

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 161–180 of 315 papers

Clear filters

Attack MEDIUM

UserLM-R1: Modeling Human Reasoning in User Language Models with Multi-Reward Reinforcement Learning

Feng Zhang, Shijia Li, Chunmao Zhang +7 more

User simulators serve as the critical interactive environment for agent post-training, and an ideal user simulator generalizes across domains and...

3 months ago cs.CL PDF

Attack MEDIUM

SafeRedir: Prompt Embedding Redirection for Robust Unlearning in Image Generation Models

Renyang Liu, Kangjie Chen, Han Qiu +4 more

Image generation models (IGMs), while capable of producing impressive and creative content, often memorize a wide range of undesirable concepts from...

3 months ago cs.CV cs.AI cs.CR PDF

Attack MEDIUM

YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation

Abdelaziz Bounhar, Rania Hossam Elmohamady Elbadry, Hadi Abdine +3 more

Steering Large Language Models (LLMs) through activation interventions has emerged as a lightweight alternative to fine-tuning for alignment and...

3 months ago cs.AI PDF

Attack MEDIUM

MCP-ITP: An Automated Framework for Implicit Tool Poisoning in MCP

Ruiqi Li, Zhiqiang Wang, Yunhao Yao +1 more

To standardize interactions between LLM-based agents and their environments, the Model Context Protocol (MCP) was proposed and has since been widely...

4 months ago cs.CR cs.AI PDF

Attack MEDIUM

On the Adversarial Robustness of 3D Large Vision-Language Models

Chao Liu, Ngai-Man Cheung

3D Vision-Language Models (VLMs), such as PointLLM and GPT4Point, have shown strong reasoning and generalization abilities in 3D understanding tasks....

4 months ago cs.CV PDF

Attack MEDIUM

Projecting Out the Malice: A Global Subspace Approach to LLM Detoxification

Zenghao Duan, Zhiyi Yin, Zhichao Shi +8 more

Large language models (LLMs) exhibit exceptional performance but pose inherent risks of generating toxic content, restricting their safe deployment....

4 months ago cs.LG cs.AI PDF

Attack MEDIUM

Effects of personality steering on cooperative behavior in Large Language Model agents

Mizuki Sakai, Mizuki Yokoyama, Wakaba Tateishi +1 more

Large language models (LLMs) are increasingly used as autonomous agents in strategic and social interactions. Although recent studies suggest that...

4 months ago cs.AI PDF

Attack MEDIUM

Deep Dive into the Abuse of DL APIs To Create Malicious AI Models and How to Detect Them

Mohamed Nabeel, Oleksii Starov

According to Gartner, more than 70% of organizations will have integrated AI models into their workflows by the end of 2025. In order to reduce cost...

4 months ago cs.CR PDF

Attack MEDIUM

Merging Triggers, Breaking Backdoors: Defensive Poisoning for Instruction-Tuned Language Models

San Kim, Gary Geunbae Lee

Large Language Models (LLMs) have greatly advanced Natural Language Processing (NLP), particularly through instruction tuning, which enables broad...

4 months ago cs.CL cs.AI PDF

Attack MEDIUM

Enhancing Moral Diagnosis and Correction in Large Language Models

Bocheng Chen, Xi Chen, Han Zi +5 more

Identifying specific moral errors in an input and generating appropriate corrections require moral sensitivity in large language models (LLMs), which...

4 months ago cs.CL PDF

Attack MEDIUM

Extracting books from production language models

Ahmed Ahmed, A. Feder Cooper, Sanmi Koyejo +1 more

Many unresolved legal questions over LLMs and copyright center on memorization: whether specific training data have been encoded in the model's...

4 months ago cs.CL cs.AI cs.LG PDF

Attack MEDIUM

SWaRL: Safeguard Code Watermarking via Reinforcement Learning

Neusha Javidnia, Ruisi Zhang, Ashish Kundu +1 more

We present SWaRL, a robust and fidelity-preserving watermarking framework designed to protect the intellectual property of code LLM owners by...

4 months ago cs.CR cs.LG PDF

Attack MEDIUM

Crafting Adversarial Inputs for Large Vision-Language Models Using Black-Box Optimization

Jiwei Guan, Haibo Jin, Haohan Wang

Recent advancements in Large Vision-Language Models (LVLMs) have shown groundbreaking capabilities across diverse multimodal tasks. However, these...

4 months ago cs.CR cs.AI cs.CV PDF

Attack MEDIUM

Aggressive Compression Enables LLM Weight Theft

Davis Brown, Juan-Pablo Rivera, Dan Hendrycks +1 more

As frontier AIs become more powerful and costly to develop, adversaries have increasing incentives to steal model weights by mounting exfiltration...

4 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

IO-RAE: Information-Obfuscation Reversible Adversarial Example for Audio Privacy Protection

Jiajie Zhu, Xia Du, Xiaoyuan Liu +4 more

The rapid advancements in artificial intelligence have significantly accelerated the adoption of speech recognition technology, leading to its...

4 months ago cs.SD cs.CR cs.MM PDF

Attack MEDIUM

PatchBlock: A Lightweight Defense Against Adversarial Patches for Embedded EdgeAI Devices

Nandish Chattopadhyay, Abdul Basit, Amira Guesmi +3 more

Adversarial attacks pose a significant challenge to the reliable deployment of machine learning models in EdgeAI applications, such as autonomous...

4 months ago cs.CR cs.AI PDF

Attack MEDIUM

Rectifying Adversarial Examples Using Their Vulnerabilities

Fumiya Morimoto, Ryuto Morita, Satoshi Ono

Deep neural network-based classifiers are prone to errors when processing adversarial examples (AEs). AEs are minimally perturbed input data...

4 months ago cs.CR cs.LG cs.NE PDF

Attack MEDIUM

The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition

Xiaoze Liu, Weichen Yu, Matt Fredrikson +2 more

The open-weight language model ecosystem is increasingly defined by model composition techniques (such as weight merging, speculative decoding, and...

4 months ago cs.LG cs.CL cs.CR PDF

Attack MEDIUM

RAGPart & RAGMask: Retrieval-Stage Defenses Against Corpus Poisoning in Retrieval-Augmented Generation

Pankayaraj Pathmanathan, Michael-Andrei Panaitescu-Liess, Cho-Yu Jason Chiang +1 more

Retrieval-Augmented Generation (RAG) has emerged as a promising paradigm to enhance large language models (LLMs) with external knowledge, reducing...

4 months ago cs.IR PDF

Attack MEDIUM

RepetitionCurse: Measuring and Understanding Router Imbalance in Mixture-of-Experts LLMs under DoS Stress

Ruixuan Huang, Qingyue Wang, Hantao Huang +4 more

Mixture-of-Experts architectures have become the standard for scaling large language models due to their superior parameter efficiency. To...

4 months ago cs.CR cs.LG PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial