AI Security Research

2,560+ academic papers on AI security, attacks, and defenses

Total

2,560

Attack

982

Benchmark

736

Defense

350

Tool

275

Survey

144

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 621–640 of 982 papers

Clear filters

Attack MEDIUM

Improved Pseudorandom Codes from Permuted Puzzles

Miranda Christ, Noah Golowich, Sam Gunn +2 more

Watermarks are an essential tool for identifying AI-generated content. Recently, Christ and Gunn (CRYPTO '24) introduced pseudorandom...

5 months ago cs.CR PDF

Attack HIGH

When Tables Leak: Attacking String Memorization in LLM-Based Tabular Data Generation

Joshua Ward, Bochao Gu, Chi-Hua Wang +1 more

Large Language Models (LLMs) have recently demonstrated remarkable performance in generating high-quality tabular synthetic data. In practice, two...

5 months ago cs.LG cs.AI PDF

Attack MEDIUM

Insured Agents: A Decentralized Trust Insurance Mechanism for Agentic Economy

Botao 'Amber' Hu, Bangdao Chen

The emerging "agentic web" envisions large populations of autonomous agents coordinating, transacting, and delegating across open networks. Yet many...

5 months ago cs.CY cs.MA PDF

Attack HIGH

Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs

Yinan Zhong, Qianhao Miao, Yanjiao Chen +3 more

Large Language Models (LLMs) have been integrated into many applications (e.g., web agents) to perform more sophisticated tasks. However,...

5 months ago cs.CR PDF

Attack HIGH

MIRAGE: Misleading Retrieval-Augmented Generation via Black-box and Query-agnostic Poisoning Attacks

Tailun Chen, Yu He, Yan Wang +9 more

Retrieval-Augmented Generation (RAG) systems enhance LLMs with external knowledge but introduce a critical attack surface: corpus poisoning. While...

5 months ago cs.CR PDF

Attack HIGH

How a Bit Becomes a Story: Semantic Steering via Differentiable Fault Injection

Zafaryab Haider, Md Hafizur Rahman, Shane Moeykens +2 more

Hard-to-detect hardware bit flips, from either malicious circuitry or bugs, have already been shown to make transformers vulnerable in non-generative...

5 months ago cs.LG cs.AI PDF

Attack LOW

Universal Adversarial Suffixes for Language Models Using Reinforcement Learning with Calibrated Reward

Sampriti Soor, Suklav Ghosh, Arijit Sur

Language models are vulnerable to short adversarial suffixes that can reliably alter predictions. Previous works usually find such suffixes with...

5 months ago cs.CL PDF

Attack HIGH

Detecting Ambiguity Aversion in Cyberattack Behavior to Inform Cognitive Defense Strategies

Stephan Carney, Soham Hans, Sofia Hirschmann +4 more

Adversaries (hackers) attempting to infiltrate networks frequently face uncertainty in their operational environments. This research explores the...

5 months ago cs.CR cs.HC PDF

Attack HIGH

TROJail: Trajectory-Level Optimization for Multi-Turn Large Language Model Jailbreaks with Process Rewards

Xiqiao Xiong, Ouxiang Li, Zhuo Liu +5 more

Large language models have seen widespread adoption, yet they remain vulnerable to multi-turn jailbreak attacks, threatening their safe deployment....

5 months ago cs.AI cs.LG PDF

Attack LOW

AdLift: Lifting Adversarial Perturbations to Safeguard 3D Gaussian Splatting Assets Against Instruction-Driven Editing

Ziming Hong, Tianyu Huang, Runnan Chen +4 more

Recent studies have extended diffusion-based instruction-driven 2D image editing pipelines to 3D Gaussian Splatting (3DGS), enabling faithful...

5 months ago cs.CV cs.CR cs.LG PDF

Attack HIGH

Response-Based Knowledge Distillation for Multilingual Jailbreak Prevention Unwittingly Compromises Safety

Max Zhang, Derek Liu, Kai Zhang +2 more

Large language models (LLMs) are increasingly deployed worldwide, yet their safety alignment remains predominantly English-centric. This allows for...

5 months ago cs.CL PDF

Attack HIGH

ThinkTrap: Denial-of-Service Attacks against Black-box LLM Services via Infinite Thinking

Yunzhe Li, Jianan Wang, Hongzi Zhu +3 more

Large Language Models (LLMs) have become foundational components in a wide range of applications, including natural language understanding and...

5 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

Replicating TEMPEST at Scale: Multi-Turn Adversarial Attacks Against Trillion-Parameter Frontier Models

Richard Young

Despite substantial investment in safety alignment, the vulnerability of large language models to sophisticated multi-turn adversarial attacks...

5 months ago cs.CL PDF

Attack MEDIUM

Large Language Models and Forensic Linguistics: Navigating Opportunities and Threats in the Age of Generative AI

George Mikros

Large language models (LLMs) present a dual challenge for forensic linguistics. They serve as powerful analytical tools enabling scalable corpus...

5 months ago cs.CL cs.CY PDF

Attack MEDIUM

From Description to Score: Can LLMs Quantify Vulnerabilities?

Sima Jafarikhah, Daniel Thompson, Eva Deans +2 more

Manual vulnerability scoring, such as assigning Common Vulnerability Scoring System (CVSS) scores, is a resource-intensive process that is often...

5 months ago cs.CR cs.AI cs.PL PDF

Attack MEDIUM

Look Twice before You Leap: A Rational Agent Framework for Localized Adversarial Anonymization

Donghang Duan, Xu Zheng, Yuefeng He +3 more

Current LLM-based text anonymization frameworks usually rely on remote API services from powerful LLMs, which creates an inherent privacy paradox:...

5 months ago cs.CR cs.CL PDF

Attack HIGH

RunawayEvil: Jailbreaking the Image-to-Video Generative Models

Songping Wang, Rufan Qian, Yueming Lyu +5 more

Image-to-Video (I2V) generation synthesizes dynamic visual content from image and text inputs, providing significant creative control. However, the...

5 months ago cs.CV PDF

Attack HIGH

Metaphor-based Jailbreaking Attacks on Text-to-Image Models

Chenyu Zhang, Yiwen Ma, Lanjun Wang +3 more

Text-to-image~(T2I) models commonly incorporate defense mechanisms to prevent the generation of sensitive images. Unfortunately, recent jailbreaking...

5 months ago cs.CR cs.AI cs.CV PDF

Attack HIGH

VRSA: Jailbreaking Multimodal Large Language Models through Visual Reasoning Sequential Attack

Shiji Zhao, Shukun Xiong, Yao Huang +7 more

Multimodal Large Language Models (MLLMs) are widely used in various fields due to their powerful cross-modal comprehension and generation...

5 months ago cs.CV PDF

Attack HIGH

ARGUS: Defending Against Multimodal Indirect Prompt Injection via Steering Instruction-Following Behavior

Weikai Lu, Ziqian Zeng, Kehua Zhang +5 more

Multimodal Large Language Models (MLLMs) are increasingly vulnerable to multimodal Indirect Prompt Injection (IPI) attacks, which embed malicious...

5 months ago cs.CR cs.MM PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial