AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 41–60 of 715 papers

Clear filters

Attack HIGH

CBV: Clean-label Backdoor Attacks on Vision Language Models via Diffusion Models

Ji Guo, Xiaolong Qin, Cencen Liu +3 more

Vision-Language Models (VLMs) have achieved remarkable success in tasks such as image captioning and visual question answering (VQA). However, as...

1 weeks ago cs.AI PDF

Attack HIGH

When Alignment Isn't Enough: Response-Path Attacks on LLM Agents

Mingyu Luo, Zihan Zhang, Zesen Liu +7 more

Bring-Your-Own-Key (BYOK) agent architectures let users route LLM traffic through third-party relays, creating a critical integrity gap: a malicious...

1 weeks ago cs.CR cs.AI PDF

Attack MEDIUM

Adversarial Update-Based Federated Unlearning for Poisoned Model Recovery

Wenwei Zhao, Xiaowen Li, Yao Liu +1 more

Federated learning (FL) is vulnerable to poisoning attacks, where malicious clients upload manipulated updates to degrade the performance of the...

1 weeks ago cs.LG cs.CR PDF

Attack MEDIUM

Disentangling Intent from Role: Adversarial Self-Play for Persona-Invariant Safety Alignment

Jiajia Li, Xiaoyu Wen, Zhongtian Ma +3 more

The growing capabilities of large language models (LLMs) have driven their widespread deployment across diverse domains, even in potentially...

1 weeks ago cs.AI PDF

Attack MEDIUM

CyberAId: AI-Driven Cybersecurity for Financial Service Providers

George Fatouros, Georgios Makridis, John Soldatos +18 more

European financial institutions face mounting regulatory pressure while their security operations centres remain constrained not by data or staffing...

1 weeks ago cs.AI cs.CR cs.IR PDF

Attack HIGH

FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption

Yanting Wang, Chenlong Yin, Ying Chen +1 more

Long-context large language models (LLMs)-for example, Gemini-3.1-Pro and Qwen-3.5-are widely used to empower many real-world applications, such as...

1 weeks ago cs.CR PDF

Attack HIGH

Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection

Prashant Kulkarni

Multi-turn prompt injection follows a known attack path -- trust-building, pivoting, escalation but text-level defenses miss covert attacks where...

1 weeks ago cs.CR cs.AI PDF

Attack HIGH

TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning

Bowen Sun, Chaozhuo Li, Yaodong Yang +2 more

Decompositional jailbreaks pose a critical threat to large language models (LLMs) by allowing adversaries to fragment a malicious objective into a...

1 weeks ago cs.CR cs.CL cs.LG PDF

Attack MEDIUM

MASCing: Configurable Mixture-of-Experts Behavior via Activation Steering Masks

Jona te Lintelo, Lichao Wu, Marina Krček +2 more

Mixture-of-Experts (MoE) architectures in Large Language Models (LLMs) have significantly reduced inference costs through sparse activation. However,...

1 weeks ago cs.CR PDF

Attack MEDIUM

Low Rank Adaptation for Adversarial Perturbation

Han Liu, Shanghao Shi, Yevgeniy Vorobeychik +2 more

Low-Rank Adaptation (LoRA), which leverages the insight that model updates typically reside in a low-dimensional space, has significantly improved...

1 weeks ago cs.LG cs.CR PDF

Attack HIGH

Secret Stealing Attacks on Local LLM Fine-Tuning through Supply-Chain Model Code Backdoors

Zi Li, Tian Zhou, Wenze Li +3 more

Local fine-tuning datasets routinely contain sensitive secrets such as API keys, personal identifiers, and financial records. Although ''local...

1 weeks ago cs.CR cs.AI PDF

Attack MEDIUM

Understanding Adversarial Transferability in Vision-Language Models for Autonomous Driving: A Cross-Architecture Analysis

David Fernandez, Pedram MohajerAnsari, Amir Salarpour +1 more

Vision-language models (VLMs) are increasingly used in autonomous driving because they combine visual perception with language-based reasoning,...

1 weeks ago cs.CV cs.CR cs.LG PDF

Attack MEDIUM

SafeTune: Mitigating Data Poisoning in LLM Fine-Tuning for RTL Code Generation

Mahshid Rezakhani, Nowfel Mashnoor, Kimia Azar +1 more

As large language models (LLMs) are increasingly fine-tuned for hardware tasks like RTL code generation, the scarcity of high-quality datasets often...

1 weeks ago cs.CR cs.AR PDF

Attack HIGH

Indirect Prompt Injection in the Wild: An Empirical Study of Prevalence, Techniques, and Objectives

Soheil Khodayari, Xuenan Zhang, Bhupendra Acharya +1 more

As LLMs are increasingly integrated into systems that browse, retrieve, summarize, and act on web content, webpages have become an untrusted input...

1 weeks ago cs.CR PDF

Attack HIGH

Enhancing Linux Privilege Escalation Attack Capabilities of Local LLM Agents

Benjamin Probst, Andreas Happe, Jürgen Cito

Recent research has demonstrated the potential of Large Language Models (LLMs) for autonomous penetration testing, particularly when using...

1 weeks ago cs.CR cs.AI PDF

Attack MEDIUM

Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry

Wenhao Lan, Shan Li, Junbin Yang +2 more

Safety-aligned language models must refuse harmful requests without collapsing into broad over-refusal, but the training-time mechanisms behind this...

1 weeks ago cs.LG cs.CL cs.CR PDF

Attack MEDIUM

Quantamination: Dynamic Quantization Leaks Your Data Across the Batch

Hanna Foerster, Ilia Shumailov, Cheng Zhang +3 more

Dynamic quantization emerged as a practical approach to increase the utilization and efficiency of the machine learning serving flow. Unlike static...

1 weeks ago cs.CR cs.LG PDF

Attack HIGH

Cross-Lingual Jailbreak Detection via Semantic Codebooks

Shirin Alanova, Bogdan Minko, Sabrina Sadiekh +1 more

Safety mechanisms for large language models (LLMs) remain predominantly English-centric, creating systematic vulnerabilities in multilingual...

2 weeks ago cs.CL cs.AI PDF

Attack HIGH

SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents

Mengyao Du, Han Fang, Haokai Ma +4 more

Web agents have emerged as an effective paradigm for automating interactions with complex web environments, yet remain vulnerable to prompt injection...

2 weeks ago cs.CR cs.AI PDF

Attack HIGH

Adaptive Prompt Embedding Optimization for LLM Jailbreaking

Miles Q. Li, Benjamin C. M. Fung, Boyang Li +2 more

Existing white-box jailbreak attacks against aligned LLMs typically append discrete adversarial suffixes to the user prompt, which visibly alters the...

2 weeks ago cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial