AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1121–1140 of 1,207 papers

Clear filters

Defense MEDIUM

Read the Scene, Not the Script: Outcome-Aware Safety for LLMs

Rui Wu, Yihao Quan, Zeru Shi +3 more

Safety-aligned Large Language Models (LLMs) still show two dominant failure modes: they are easily jailbroken, or they over-refuse harmless inputs...

7 months ago cs.CL cs.LG PDF

Tool MEDIUM

Real-VulLLM: An LLM Based Assessment Framework in the Wild

Rijha Safdar, Danyail Mateen, Syed Taha Ali +1 more

Artificial Intelligence (AI) and more specifically Large Language Models (LLMs) have demonstrated exceptional progress in multiple areas including...

7 months ago cs.CR PDF

Attack MEDIUM

From Poisoned to Aware: Fostering Backdoor Self-Awareness in LLMs

Guangyu Shen, Siyuan Cheng, Xiangzhe Xu +4 more

Large Language Models (LLMs) can acquire deceptive behaviors through backdoor attacks, where the model executes prohibited actions whenever secret...

7 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Quantifying Distributional Robustness of Agentic Tool-Selection

Jehyeok Yeon, Isha Chaudhary, Gagandeep Singh

Large language models (LLMs) are increasingly deployed in agentic systems where they map user intents to relevant external tools to fulfill a task. A...

7 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

How Catastrophic is Your LLM? Certifying Risk in Conversation

Chengxiao Wang, Isha Chaudhary, Qian Hu +3 more

Large Language Models (LLMs) can produce catastrophic responses in conversational settings that pose serious risks to public safety and security....

7 months ago cs.AI cs.CR cs.LG PDF

Benchmark MEDIUM

LLM as an Algorithmist: Enhancing Anomaly Detectors via Programmatic Synthesis

Hangting Ye, Jinmeng Li, He Zhao +4 more

Existing anomaly detection (AD) methods for tabular data usually rely on some assumptions about anomaly patterns, leading to inconsistent performance...

7 months ago cs.LG PDF

Other MEDIUM

Rainbow Padding: Mitigating Early Termination in Instruction-Tuned Diffusion LLMs

Bumjun Kim, Dongjae Jeon, Dueun Kim +2 more

Diffusion large language models (dLLMs) have emerged as a promising alternative to autoregressive models, offering flexible generation orders and...

7 months ago cs.AI PDF

Attack MEDIUM

Cross-Modal Content Optimization for Steering Web Agent Preferences

Tanqiu Jiang, Min Bai, Nikolaos Pappas +2 more

Vision-language model (VLM)-based web agents increasingly power high-stakes selection tasks like content recommendation or product ranking by...

7 months ago cs.AI cs.CR PDF

Attack MEDIUM

Machine Unlearning Meets Adversarial Robustness via Constrained Interventions on LLMs

Fatmazohra Rezkellah, Ramzi Dakhmouche

With the increasing adoption of Large Language Models (LLMs), more customization is needed to ensure privacy-preserving and safe generation. We...

7 months ago cs.LG cs.CL cs.CR PDF

Benchmark MEDIUM

Certifiable Safe RLHF: Fixed-Penalty Constraint Optimization for Safer Language Models

Kartik Pandit, Sourav Ganguly, Arnesh Banerjee +2 more

Ensuring safety is a foundational requirement for large language models (LLMs). Achieving an appropriate balance between enhancing the utility of...

7 months ago cs.LG cs.AI eess.SY PDF

Benchmark MEDIUM

FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents

Imene Kerboua, Sahar Omidi Shayegan, Megh Thakkar +7 more

Web agents powered by large language models (LLMs) must process lengthy web page observations to complete user goals; these pages often exceed tens...

7 months ago cs.CL PDF

Benchmark MEDIUM

Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain

Léo Boisvert, Abhay Puri, Chandra Kiran Reddy Evuru +6 more

While finetuning AI agents on interaction data -- such as web browsing or tool use -- improves their capabilities, it also introduces critical...

7 months ago cs.CR cs.AI cs.LG PDF

Benchmark MEDIUM

Zero-Shot Robustness of Vision Language Models Via Confidence-Aware Weighting

Nikoo Naghavian, Mostafa Tavassolipour

Vision-language models like CLIP demonstrate impressive zero-shot generalization but remain highly vulnerable to adversarial attacks. In this work,...

7 months ago cs.CV PDF

Attack MEDIUM

Adversarial Reinforcement Learning for Offensive and Defensive Agents in a Simulated Zero-Sum Network Environment

Abrar Shahid, Ibteeker Mahir Ishum, AKM Tahmidul Haque +2 more

This paper presents a controlled study of adversarial reinforcement learning in network security through a custom OpenAI Gym environment that models...

7 months ago cs.LG cs.AI cs.CR PDF

Defense MEDIUM

VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation

Lesly Miculicich, Mihir Parmar, Hamid Palangi +4 more

The deployment of autonomous AI agents in sensitive domains, such as healthcare, introduces critical risks to safety, security, and privacy. These...

7 months ago cs.SE cs.AI cs.CR PDF

Tool MEDIUM

MALF: A Multi-Agent LLM Framework for Intelligent Fuzzing of Industrial Control Protocols

Bowei Ning, Xuejun Zong, Kan He

Industrial control systems (ICS) are vital to modern infrastructure but increasingly vulnerable to cybersecurity threats, particularly through...

7 months ago cs.CR PDF

Benchmark MEDIUM

Who's Wearing? Ear Canal Biometric Key Extraction for User Authentication on Wireless Earbuds

Chenpei Huang, Lingfeng Yao, Hui Zhong +5 more

Ear canal scanning/sensing (ECS) has emerged as a novel biometric authentication method for mobile devices paired with wireless earbuds. Existing...

7 months ago cs.CR cs.HC PDF

Defense MEDIUM

UpSafe$^\circ$C: Upcycling for Controllable Safety in Large Language Models

Yuhao Sun, Zhuoer Xu, Shiwen Cui +4 more

Large Language Models (LLMs) have achieved remarkable progress across a wide range of tasks, but remain vulnerable to safety risks such as harmful...

7 months ago cs.AI cs.CR cs.LG PDF

Attack MEDIUM

Inverse Language Modeling towards Robust and Grounded LLMs

Davide Gabrielli, Simone Sestito, Iacopo Masi

The current landscape of defensive mechanisms for LLMs is fragmented and underdeveloped, unlike prior work on classifiers. To further promote...

7 months ago cs.CL PDF

Benchmark MEDIUM

Are LLMs Better GNN Helpers? Rethinking Robust Graph Learning under Deficiencies with Iterative Refinement

Zhaoyan Wang, Zheng Gao, Arogya Kharel +1 more

Graph Neural Networks (GNNs) are widely adopted in Web-related applications, serving as a core technique for learning from graph-structured data,...

7 months ago cs.LG cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial