AI Security Research

2,560+ academic papers on AI security, attacks, and defenses

Total

2,560

Attack

982

Benchmark

736

Defense

350

Tool

275

Survey

144

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 701–720 of 879 papers

Clear filters

Attack HIGH

Learning to Attack: Uncovering Privacy Risks in Sequential Data Releases

Ziyao Cui, Minxing Zhang, Jian Pei

Privacy concerns have become increasingly critical in modern AI and data science applications, where sensitive information is collected, analyzed,...

6 months ago cs.CR cs.LG PDF

Attack HIGH

AutoPrompt: Automated Red-Teaming of Text-to-Image Models via LLM-Driven Adversarial Prompts

Yufan Liu, Wanqian Zhang, Huashan Chen +4 more

Despite rapid advancements in text-to-image (T2I) models, their safety mechanisms are vulnerable to adversarial prompts, which maliciously generate...

6 months ago cs.CV PDF

Attack HIGH

QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents

Yuchong Xie, Zesen Liu, Mingyu Luo +7 more

Modern coding agents integrated into IDEs orchestrate powerful tools and high-privilege system access, creating a high-stakes attack surface. Prior...

6 months ago cs.CR cs.AI PDF

Attack HIGH

CompressionAttack: Exploiting Prompt Compression as a New Attack Surface in LLM-Powered Agents

Zesen Liu, Zhixiang Zhang, Yuchong Xie +1 more

LLM-powered agents often use prompt compression to reduce inference costs, but this introduces a new security risk. Compression modules, which are...

6 months ago cs.CR cs.AI PDF

Tool HIGH

Sentra-Guard: A Multilingual Human-AI Framework for Real-Time Defense Against Adversarial LLM Jailbreaks

Md. Mehedi Hasan, Ziaur Rahman, Rafid Mostafiz +1 more

This paper presents a real-time modular defense system named Sentra-Guard. The system detects and mitigates jailbreak and prompt injection attacks...

6 months ago cs.CR cs.AI PDF

Attack HIGH

Cross-Paradigm Graph Backdoor Attacks with Promptable Subgraph Triggers

Dongyi Liu, Jiangtong Li, Dawei Cheng +1 more

Graph Neural Networks(GNNs) are vulnerable to backdoor attacks, where adversaries implant malicious triggers to manipulate model predictions....

6 months ago cs.CR cs.LG PDF

Attack HIGH

SecureLearn -- An Attack-agnostic Defense for Multiclass Machine Learning Against Data Poisoning Attacks

Anum Paracha, Junaid Arshad, Mohamed Ben Farah +1 more

Data poisoning attacks are a potential threat to machine learning (ML) models, aiming to manipulate training datasets to disrupt their performance....

6 months ago cs.CR cs.LG PDF

Attack HIGH

Jailbreak Mimicry: Automated Discovery of Narrative-Based Jailbreaks for Large Language Models

Pavlos Ntais

Large language models (LLMs) remain vulnerable to sophisticated prompt engineering attacks that exploit contextual framing to bypass safety...

6 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Uncovering the Persuasive Fingerprint of LLMs in Jailbreaking Attacks

Havva Alizadeh Noughabi, Julien Serbanescu, Fattane Zarrinkalam +1 more

Despite recent advances, Large Language Models remain vulnerable to jailbreak attacks that bypass alignment safeguards and elicit harmful outputs....

6 months ago cs.CL cs.AI PDF

Attack HIGH

$δ$-STEAL: LLM Stealing Attack with Local Differential Privacy

Kieu Dang, Phung Lai, NhatHai Phan +3 more

Large language models (LLMs) demonstrate remarkable capabilities across various tasks. However, their deployment introduces significant risks related...

6 months ago cs.CR PDF

Attack HIGH

Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks

Mahavir Dabas, Tran Huynh, Nikhil Reddy Billa +8 more

Large language models remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs. Defending against novel...

6 months ago cs.LG PDF

Attack HIGH

Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses

Xingwei Zhong, Kar Wai Fok, Vrizlynn L. L. Thing

Multimodal large language models (MLLMs) comprise of both visual and textual modalities to process vision language tasks. However, MLLMs are...

6 months ago cs.CR PDF

Attack HIGH

The Trojan Example: Jailbreaking LLMs through Template Filling and Unsafety Reasoning

Mingrui Liu, Sixiao Zhang, Cheng Long +1 more

As Large Language Models (LLMs) become integral to computing infrastructure, safety alignment serves as the primary security control preventing the...

6 months ago cs.CR PDF

Attack HIGH

Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency

Yukun Jiang, Mingjie Li, Michael Backes +1 more

Despite their superior performance on a wide range of domains, large language models (LLMs) remain vulnerable to misuse for generating harmful...

6 months ago cs.CR PDF

Attack HIGH

Can Current Detectors Catch Face-to-Voice Deepfake Attacks?

Nguyen Linh Bao Nguyen, Alsharif Abuadbba, Kristen Moore +1 more

The rapid advancement of generative models has enabled the creation of increasingly stealthy synthetic voices, commonly referred to as audio...

6 months ago cs.CR cs.LG cs.MM PDF

Attack HIGH

Self-Jailbreaking: Language Models Can Reason Themselves Out of Safety Alignment After Benign Reasoning Training

Zheng-Xin Yong, Stephen H. Bach

We discover a novel and surprising phenomenon of unintentional misalignment in reasoning language models (RLMs), which we call self-jailbreaking....

6 months ago cs.CR cs.CL PDF

Attack HIGH

AdaDoS: Adaptive DoS Attack via Deep Adversarial Reinforcement Learning in SDN

Wei Shao, Yuhao Wang, Rongguang He +2 more

Existing defence mechanisms have demonstrated significant effectiveness in mitigating rule-based Denial-of-Service (DoS) attacks, leveraging...

6 months ago cs.CR cs.AI PDF

Attack HIGH

GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Environments?

Chiyu Chen, Xinhao Song, Yunkai Chai +7 more

Vision-Language Models (VLMs) are increasingly deployed as autonomous agents to navigate mobile graphical user interfaces (GUIs). Operating in...

6 months ago cs.CR cs.AI PDF

Survey HIGH

Enhancing Security in Deep Reinforcement Learning: A Comprehensive Survey on Adversarial Attacks and Defenses

Wu Yichao, Wang Yirui, Ding Panpan +3 more

With the wide application of deep reinforcement learning (DRL) techniques in complex fields such as autonomous driving, intelligent manufacturing,...

6 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

Beyond Text: Multimodal Jailbreaking of Vision-Language and Audio Models through Perceptually Simple Transformations

Divyanshu Kumar, Shreyas Jena, Nitin Aravind Birur +3 more

Multimodal large language models (MLLMs) have achieved remarkable progress, yet remain critically vulnerable to adversarial attacks that exploit...

6 months ago cs.CR cs.MM PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial