AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 281–300 of 312 papers

Clear filters

Attack MEDIUM

Are LLMs Reliable Rankers? Rank Manipulation via Two-Stage Token Optimization

Tiancheng Xing, Jerry Li, Yixuan Du +1 more

Large language models (LLMs) are increasingly used as rerankers in information retrieval, yet their ranking behavior can be steered by small,...

7 months ago cs.CL cs.AI cs.IR PDF

Attack MEDIUM

Adversarial Reinforcement Learning for Large Language Model Agent Safety

Zizhao Wang, Dingcheng Li, Vaishakh Keshava +4 more

Large Language Model (LLM) agents can leverage tools such as Google Search to complete complex tasks. However, this tool usage introduces the risk of...

7 months ago cs.LG cs.AI cs.CL PDF

Attack MEDIUM

From Poisoned to Aware: Fostering Backdoor Self-Awareness in LLMs

Guangyu Shen, Siyuan Cheng, Xiangzhe Xu +4 more

Large Language Models (LLMs) can acquire deceptive behaviors through backdoor attacks, where the model executes prohibited actions whenever secret...

7 months ago cs.CR cs.AI PDF

Attack MEDIUM

Cross-Modal Content Optimization for Steering Web Agent Preferences

Tanqiu Jiang, Min Bai, Nikolaos Pappas +2 more

Vision-language model (VLM)-based web agents increasingly power high-stakes selection tasks like content recommendation or product ranking by...

7 months ago cs.AI cs.CR PDF

Attack MEDIUM

Machine Unlearning Meets Adversarial Robustness via Constrained Interventions on LLMs

Fatmazohra Rezkellah, Ramzi Dakhmouche

With the increasing adoption of Large Language Models (LLMs), more customization is needed to ensure privacy-preserving and safe generation. We...

7 months ago cs.LG cs.CL cs.CR PDF

Attack MEDIUM

Adversarial Reinforcement Learning for Offensive and Defensive Agents in a Simulated Zero-Sum Network Environment

Abrar Shahid, Ibteeker Mahir Ishum, AKM Tahmidul Haque +2 more

This paper presents a controlled study of adversarial reinforcement learning in network security through a custom OpenAI Gym environment that models...

7 months ago cs.LG cs.AI cs.CR PDF

Attack MEDIUM

Inverse Language Modeling towards Robust and Grounded LLMs

Davide Gabrielli, Simone Sestito, Iacopo Masi

The current landscape of defensive mechanisms for LLMs is fragmented and underdeveloped, unlike prior work on classifiers. To further promote...

7 months ago cs.CL PDF

Attack MEDIUM

Securing generative artificial intelligence with parallel magnetic tunnel junction true randomness

Youwei Bao, Shuhan Yang, Hyunsoo Yang

Deterministic pseudo random number generators (PRNGs) used in generative artificial intelligence (GAI) models produce predictable patterns vulnerable...

7 months ago cs.LG cond-mat.mtrl-sci physics.data-an PDF

Attack MEDIUM

AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning

Zhenyu Pan, Yiting Zhang, Zhuo Liu +13 more

LLM-based multi-agent systems excel at planning, tool use, and role coordination, but their openness and interaction complexity also expose them to...

7 months ago cs.AI PDF

Attack MEDIUM

Bypassing Prompt Guards in Production with Controlled-Release Prompting

Jaiden Fairoze, Sanjam Garg, Keewoo Lee +1 more

As large language models (LLMs) advance, ensuring AI safety and alignment is paramount. One popular approach is prompt guards, lightweight mechanisms...

7 months ago cs.LG cs.CR PDF

Attack MEDIUM

Eyes-on-Me: Scalable RAG Poisoning through Transferable Attention-Steering Attractors

Yen-Shan Chen, Sian-Yao Huang, Cheng-Lin Yang +1 more

Existing data poisoning attacks on retrieval-augmented generation (RAG) systems scale poorly because they require costly optimization of poisoned...

7 months ago cs.LG cs.CL cs.CR PDF

Attack MEDIUM

Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness

Tsubasa Takahashi, Shojiro Yamabe, Futa Waseda +1 more

Differential Attention (DA) has been proposed as a refinement to standard attention, suppressing redundant or noisy context through a subtractive...

7 months ago cs.LG cs.CR PDF

Attack MEDIUM

Has the Two-Decade-Old Prophecy Come True? Artificial Bad Intelligence Triggered by Merely a Single-Bit Flip in Large Language Models

Yu Yan, Siqi Lu, Yang Gao +4 more

Recently, Bit-Flip Attack (BFA) has garnered widespread attention for its ability to compromise software system integrity remotely through hardware...

7 months ago cs.CR PDF

Attack MEDIUM

A Call to Action for a Secure-by-Design Generative AI Paradigm

Dalal Alharthi, Ivan Roberto Kawaminami Garcia

Large language models have gained widespread prominence, yet their vulnerability to prompt injection and other adversarial attacks remains a critical...

7 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

MOLM: Mixture of LoRA Markers

Samar Fares, Nurbek Tastan, Noor Hussein +1 more

Generative models can generate photorealistic images at scale. This raises urgent concerns about the ability to detect synthetically generated images...

7 months ago cs.CV cs.CR cs.LG PDF

Attack MEDIUM

CHAI: Command Hijacking against embodied AI

Luis Burbano, Diego Ortiz, Qi Sun +5 more

Embodied Artificial Intelligence (AI) promises to handle edge cases in robotic vehicle systems where data is scarce by using common-sense reasoning...

7 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

Are Robust LLM Fingerprints Adversarially Robust?

Anshul Nasery, Edoardo Contente, Alkin Kaz +2 more

Model fingerprinting has emerged as a promising paradigm for claiming model ownership. However, robustness evaluations of these schemes have mostly...

7 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

DeepProv: Behavioral Characterization and Repair of Neural Networks via Inference Provenance Graph Analysis

Firas Ben Hmida, Abderrahmen Amich, Ata Kaboudi +1 more

Deep neural networks (DNNs) are increasingly being deployed in high-stakes applications, from self-driving cars to biometric authentication. However,...

7 months ago cs.CR cs.LG PDF

Attack MEDIUM

The Impact of Scaling Training Data on Adversarial Robustness

Marco Zimmerli, Andreas Plesner, Till Aczel +1 more

Deep neural networks remain vulnerable to adversarial examples despite advances in architectures and training paradigms. We investigate how training...

7 months ago cs.CV cs.AI cs.CR PDF

Attack MEDIUM

Better Privilege Separation for Agents by Restricting Data Types

Dennis Jacob, Emad Alghamdi, Zhanhao Hu +2 more

Large language models (LLMs) have become increasingly popular due to their ability to interact with unstructured content. As such, LLMs are now a key...

7 months ago cs.CR cs.LG PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial