AI Security Research

2,560+ academic papers on AI security, attacks, and defenses

Total

2,560

Attack

982

Benchmark

736

Defense

350

Tool

275

Survey

144

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 721–740 of 982 papers

Clear filters

Attack MEDIUM

On the Trade-Off Between Transparency and Security in Adversarial Machine Learning

Lucas Fenaux, Christopher Srinivasa, Florian Kerschbaum

Transparency and security are both central to Responsible AI, but they may conflict in adversarial settings. We investigate the strategic effect of...

5 months ago cs.LG cs.CR cs.GT PDF

Attack HIGH

NegBLEURT Forest: Leveraging Inconsistencies for Detecting Jailbreak Attacks

Lama Sleem, Jerome Francois, Lujun Li +3 more

Jailbreak attacks designed to bypass safety mechanisms pose a serious threat by prompting LLMs to generate harmful or inappropriate content, despite...

5 months ago cs.CR cs.AI PDF

Attack MEDIUM

Data Poisoning Vulnerabilities Across Healthcare AI Architectures: A Security Threat Analysis

Farhad Abtahi, Fernando Seoane, Iván Pau +1 more

Healthcare AI systems face major vulnerabilities to data poisoning that current defenses and regulations cannot adequately address. We analyzed eight...

6 months ago cs.CR cs.AI PDF

Attack HIGH

PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization

Runpeng Geng, Yanting Wang, Chenlong Yin +3 more

Long context LLMs are vulnerable to prompt injection, where an attacker can inject an instruction in a long context to induce an LLM to generate an...

6 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Say It Differently: Linguistic Styles as Jailbreak Vectors

Srikant Panda, Avinash Rai

Large Language Models (LLMs) are commonly evaluated for robustness against paraphrased or semantically equivalent jailbreak prompts, yet little...

6 months ago cs.CL cs.AI PDF

Attack HIGH

BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models

Shuaitong Liu, Renjue Li, Lijia Yu +3 more

Recent advances in Chain-of-Thought (CoT) prompting have substantially improved the reasoning capabilities of large language models (LLMs), but have...

6 months ago cs.CR cs.AI PDF

Attack HIGH

Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard

Yudong Yang, Xuezhen Zhang, Zhifeng Han +6 more

Recent progress in LLMs has enabled understanding of audio signals, but has also exposed new safety risks arising from complex audio inputs that are...

6 months ago cs.SD cs.AI PDF

Attack HIGH

MTAttack: Multi-Target Backdoor Attacks against Large Vision-Language Models

Zihan Wang, Guansong Pang, Wenjun Miao +2 more

Recent advances in Large Visual Language Models (LVLMs) have demonstrated impressive performance across various vision-language tasks by leveraging...

6 months ago cs.CV PDF

Attack LOW

Value-Aligned Prompt Moderation via Zero-Shot Agentic Rewriting for Safe Image Generation

Xin Zhao, Xiaojun Chen, Bingshan Liu +3 more

Generative vision-language models like Stable Diffusion demonstrate remarkable capabilities in creative media synthesis, but they also pose...

6 months ago cs.AI cs.CR cs.CV PDF

Attack HIGH

Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment

Shigeki Kusaka, Keita Saito, Mikoto Kudo +3 more

Large language models (LLMs) are increasingly deployed in real-world systems, making it critical to understand their vulnerabilities. While data...

6 months ago cs.LG cs.AI PDF

Attack HIGH

StyleBreak: Revealing Alignment Vulnerabilities in Large Audio-Language Models via Style-Aware Audio Jailbreak

Hongyi Li, Chengxuan Zhou, Chu Wang +5 more

Large Audio-language Models (LAMs) have recently enabled powerful speech-based interactions by coupling audio encoders with Large Language Models...

6 months ago cs.SD PDF

Attack MEDIUM

iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification

Zixun Xiong, Gaoyi Wu, Qingyang Yu +5 more

Given the high cost of large language model (LLM) training from scratch, safeguarding LLM intellectual property (IP) has become increasingly crucial....

6 months ago cs.CR cs.AI PDF

Attack HIGH

A methodological analysis of prompt perturbations and their effect on attack success rates

Tiago Machado, Maysa Malfiza Garcia de Macedo, Rogerio Abreu de Paula +5 more

This work aims to investigate how different Large Language Models (LLMs) alignment methods affect the models' responses to prompt attacks. We...

6 months ago cs.CL PDF

Attack MEDIUM

SOM Directions are Better than One: Multi-Directional Refusal Suppression in Language Models

Giorgio Piras, Raffaele Mura, Fabio Brau +3 more

Refusal refers to the functional behavior enabling safety-aligned language models to reject harmful or unethical prompts. Following the growing...

6 months ago cs.AI cs.LG PDF

Attack HIGH

Why does weak-OOD help? A Further Step Towards Understanding Jailbreaking VLMs

Yuxuan Zhou, Yuzhao Peng, Yang Bai +7 more

Large Vision-Language Models (VLMs) are susceptible to jailbreak attacks: researchers have developed a variety of attack strategies that can...

6 months ago cs.CR PDF

Attack LOW

PrAda-GAN: A Private Adaptive Generative Adversarial Network with Bayes Network Structure

Ke Jia, Yuheng Ma, Yang Li +1 more

We revisit the problem of generating synthetic data under differential privacy. To address the core limitations of marginal-based methods, we propose...

6 months ago stat.ML cs.CR cs.LG PDF

Attack HIGH

Class-feature Watermark: A Resilient Black-box Watermark Against Model Extraction Attacks

Yaxin Xiao, Qingqing Ye, Zi Liang +4 more

Machine learning models constitute valuable intellectual property, yet remain vulnerable to model extraction attacks (MEA), where adversaries...

6 months ago cs.CR cs.CV cs.LG PDF

Attack HIGH

LoopLLM: Transferable Energy-Latency Attacks in LLMs via Repetitive Generation

Xingyu Li, Xiaolei Liu, Cheng Liu +4 more

As large language models (LLMs) scale, their inference incurs substantial computational resources, exposing them to energy-latency attacks, where...

6 months ago cs.CR cs.AI cs.CL PDF

Attack MEDIUM

Graph Representation-based Model Poisoning on the Heterogeneous Internet of Agents

Hanlin Cai, Houtianfu Wang, Haofan Dong +3 more

Internet of Agents (IoA) envisions a unified, agent-centric paradigm where heterogeneous large language model (LLM) agents can interconnect and...

6 months ago cs.NI cs.CL PDF

Attack MEDIUM

E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis

Zhisheng Zhang, Derui Wang, Yifan Mi +6 more

Recent advancements in speech synthesis technology have enriched our daily lives, with high-quality and human-like audio widely adopted across...

6 months ago cs.SD cs.AI cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial