AI Security Research

2,583+ academic papers on AI security, attacks, and defenses

Total

2,583

Attack

994

Benchmark

740

Defense

355

Tool

275

Survey

146

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 501–520 of 994 papers

Clear filters

Attack HIGH

TrojanPraise: Jailbreak LLMs via Benign Fine-Tuning

Zhixin Xie, Xurui Song, Jun Luo

The demand of customized large language models (LLMs) has led to commercial LLMs offering black-box fine-tuning APIs, yet this convenience introduces...

3 months ago cs.CR cs.LG PDF

Attack HIGH

Zero-Shot Embedding Drift Detection: A Lightweight Defense Against Prompt Injections in LLMs

Anirudh Sekar, Mrinal Agarwal, Rachel Sharma +4 more

Prompt injection attacks have become an increasing vulnerability for LLM applications, where adversarial prompts exploit indirect input channels such...

3 months ago cs.CR cs.CL PDF

Attack MEDIUM

Building Production-Ready Probes For Gemini

János Kramár, Joshua Engels, Zheng Wang +4 more

Frontier language model capabilities are improving rapidly. We thus need stronger mitigations against bad actors misusing increasingly powerful...

3 months ago cs.LG cs.AI cs.CL PDF

Attack MEDIUM

LoRA as Oracle

Marco Arazzi, Antonino Nocera

Backdoored and privacy-leaking deep neural networks pose a serious threat to the deployment of machine learning systems in security-critical...

3 months ago cs.CR cs.AI PDF

Attack HIGH

SD-RAG: A Prompt-Injection-Resilient Framework for Selective Disclosure in Retrieval-Augmented Generation

Aiman Al Masoud, Marco Arazzi, Antonino Nocera

Retrieval-Augmented Generation (RAG) has attracted significant attention due to its ability to combine the generative capabilities of Large Language...

3 months ago cs.CR cs.AI PDF

Attack HIGH

AJAR: Adaptive Jailbreak Architecture for Red-teaming

Yipu Dou, Wang Yang

Large language model (LLM) safety evaluation is moving from content moderation to action security as modern systems gain persistent state, tool...

3 months ago cs.CR cs.CL PDF

Attack HIGH

Serverless AI Security: Attack Surface Analysis and Runtime Protection Mechanisms for FaaS-Based Machine Learning

Chetan Pathade, Vinod Dhimam, Sheheryar Ahmad +1 more

Serverless computing has achieved widespread adoption, with over 70% of AWS organizations using serverless solutions [1]. Meanwhile, machine learning...

3 months ago cs.CR cs.AI PDF

Attack HIGH

Defending Large Language Models Against Jailbreak Attacks via In-Decoding Safety-Awareness Probing

Yinzhi Zhao, Ming Wang, Shi Feng +3 more

Large language models (LLMs) have achieved impressive performance across natural language tasks and are increasingly deployed in real-world...

3 months ago cs.AI cs.CL PDF

Attack MEDIUM

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

Christina Lu, Jack Gallagher, Jonathan Michala +2 more

Large language models can represent a variety of personas but typically default to a helpful Assistant identity cultivated during post-training. We...

3 months ago cs.CL PDF

Attack MEDIUM

The Straight and Narrow: Do LLMs Possess an Internal Moral Path?

Luoming Hu, Jingjie Zeng, Liang Yang +1 more

Enhancing the moral alignment of Large Language Models (LLMs) is a critical challenge in AI safety. Current alignment techniques often act as...

4 months ago cs.CL PDF

Attack HIGH

Reasoning Hijacking: Subverting LLM Classification via Decision-Criteria Injection

Yuansen Liu, Yixuan Tang, Anthony Kum Hoe Tun

Current LLM safety research predominantly focuses on mitigating Goal Hijacking, preventing attackers from redirecting a model's high-level objective...

4 months ago cs.CR PDF

Attack LOW

Fundamental Limitations of Favorable Privacy-Utility Guarantees for DP-SGD

Murat Bilgehan Ertan, Marten van Dijk

Differentially Private Stochastic Gradient Descent (DP-SGD) is the dominant paradigm for private training, but its fundamental limitations under...

4 months ago cs.LG cs.CR PDF

Attack HIGH

ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack

Hao Li, Yankai Yang, G. Edward Suh +2 more

Large Language Models (LLMs) have enabled the development of powerful agentic systems capable of automating complex workflows across various fields....

4 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multistep Malware Delivery Mechanism

Oleg Brodt, Elad Feldman, Bruce Schneier +1 more

Prompt injection was initially framed as the large language model (LLM) analogue of SQL injection. However, over the past three years, attacks...

4 months ago cs.CR cs.AI PDF

Attack HIGH

SpatialJB: How Text Distribution Art Becomes the "Jailbreak Key" for LLM Guardrails

Zhiyi Mou, Jingyuan Yang, Zeheng Qian +6 more

While Large Language Models (LLMs) have powerful capabilities, they remain vulnerable to jailbreak attacks, which is a critical barrier to their safe...

4 months ago cs.CR PDF

Attack MEDIUM

UserLM-R1: Modeling Human Reasoning in User Language Models with Multi-Reward Reinforcement Learning

Feng Zhang, Shijia Li, Chunmao Zhang +7 more

User simulators serve as the critical interactive environment for agent post-training, and an ideal user simulator generalizes across domains and...

4 months ago cs.CL PDF

Attack HIGH

KryptoPilot: An Open-World Knowledge-Augmented LLM Agent for Automated Cryptographic Exploitation

Xiaonan Liu, Zhihao Li, Xiao Lan +3 more

Capture-the-Flag (CTF) competitions play a central role in modern cybersecurity as a platform for training practitioners and evaluating offensive and...

4 months ago cs.CR PDF

Attack HIGH

Too Helpful to Be Safe: User-Mediated Attacks on Planning and Web-Use Agents

Fengchao Chen, Tingmin Wu, Van Nguyen +1 more

Large Language Models (LLMs) have enabled agents to move beyond conversation toward end-to-end task execution and become more helpful. However, this...

4 months ago cs.CR PDF

Attack MEDIUM

SafeRedir: Prompt Embedding Redirection for Robust Unlearning in Image Generation Models

Renyang Liu, Kangjie Chen, Han Qiu +4 more

Image generation models (IGMs), while capable of producing impressive and creative content, often memorize a wide range of undesirable concepts from...

4 months ago cs.CV cs.AI cs.CR PDF

Attack MEDIUM

YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation

Abdelaziz Bounhar, Rania Hossam Elmohamady Elbadry, Hadi Abdine +3 more

Steering Large Language Models (LLMs) through activation interventions has emerged as a lightweight alternative to fine-tuning for alignment and...

4 months ago cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial