AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 21–40 of 867 papers

Clear filters

Survey HIGH

SoK: Robustness in Large Language Models against Jailbreak Attacks

Feiyue Xu, Hongsheng Hu, Chaoxiang He +9 more

Large Language Models (LLMs) have achieved remarkable success but remain highly susceptible to jailbreak attacks, in which adversarial prompts coerce...

6 days ago cs.CR cs.AI PDF

Tool HIGH

DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents

Zhaorun Chen, Xun Liu, Haibo Tong +14 more

AI agents are increasingly deployed across diverse domains to automate complex workflows through long-horizon and high-stakes action executions. Due...

6 days ago cs.AI PDF

Attack HIGH

Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization

Zheng Fang, Xiaosen Wang, Shenyi Zhang +2 more

Jailbreak attacks on audio language models (ALMs) optimize audio perturbations to elicit unsafe generations, and they typically update the entire...

6 days ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Misrouter: Exploiting Routing Mechanisms for Input-Only Attacks on Mixture-of-Experts LLMs

Zekun Fei, Zihao Wang, Weijie Liu +4 more

Mixture-of-Experts (MoE) architectures have emerged as a leading paradigm for scaling large language models through sparse, routing-based...

6 days ago cs.CR PDF

Attack HIGH

Redefining AI Red Teaming in the Agentic Era: From Weeks to Hours

Raja Sekhar Rao Dheekonda, Will Pearce, Nick Landers

AI systems are entering critical domains like healthcare, finance, and defense, yet remain vulnerable to adversarial attacks. While AI red teaming is...

1 weeks ago cs.AI cs.CR PDF

Attack HIGH

Generating Proof-of-Vulnerability Tests to Help Enhance the Security of Complex Software

Shravya Kanchi, Xiaoyan Zang, Ying Zhang +2 more

Developers create modern software applications (Apps) on top of third-party libraries (Libs). When library vulnerabilities are reachable through...

1 weeks ago cs.CR cs.SE PDF

Attack HIGH

Membership Inference Attacks for Retrieval Based In-Context Learning for Document Question Answering

Tejas Kulkarni, Antti Koskela, Laith Zumot

We show that remotely hosted applications employing in-context learning when augmented with a retrieval function to select in-context examples can be...

1 weeks ago cs.CR cs.LG PDF

Tool HIGH

Exposing LLM Safety Gaps Through Mathematical Encoding:New Attacks and Systematic Analysis

Haoyu Zhang, Mohammad Zandsalimy, Shanu Sushmita

Large language models (LLMs) employ safety mechanisms to prevent harmful outputs, yet these defenses primarily rely on semantic pattern matching. We...

1 weeks ago cs.CR cs.AI cs.CL PDF

Attack HIGH

ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

Shihao Weng, Yang Feng, Jinrui Zhang +3 more

The rise of Large Language Model (LLM) agents, augmented with tool use, skills, and external knowledge, has introduced new security risks. Among...

1 weeks ago cs.CR cs.SE PDF

Attack HIGH

Revisiting JBShield: Breaking and Rebuilding Representation-Level Jailbreak Defenses

Kemal Derya, Berk Sunar

Defending large language models (LLMs) against jailbreak attacks, such as Greedy Coordinate Gradient (GCG), remains a challenge, particularly under...

1 weeks ago cs.CR PDF

Attack HIGH

EvoPoC: Automated Exploit Synthesis for DeFi Smart Contracts via Hierarchical Knowledge Graphs

Ruichao Liang, Jing Chen, Xianglong Li +5 more

Smart contract vulnerabilities in Decentralized Finance caused over billions of dollars losses every year, yet the security community faces a...

1 weeks ago cs.CR cs.SE PDF

Attack HIGH

ContextualJailbreak: Evolutionary Red-Teaming via Simulated Conversational Priming

Mario Rodríguez Béjar, Francisco J. Cortés-Delgado, S. Braghin +1 more

Large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safety alignment and elicit harmful responses. A growing body of work...

1 weeks ago cs.CL cs.CR PDF

Attack HIGH

Noisy Networks, Nosy Neighbors: Simple Privacy Attacks Against Residential Wireless Traffic

Arne Roszeitis, Bartosz Burgiel, Victor Jüttner +1 more

Smart devices, such as light bulbs, TVs, fridges, etc., equipped with computing capabilities and wireless communication, are part of everyday life in...

1 weeks ago cs.CR PDF

Attack HIGH

APIOT: Autonomous Vulnerability Management Across Bare-Metal Industrial OT Networks

Adel ElZemity, Budi Arief, Shujun Li +6 more

Bare-metal operational technology (OT) devices -- especially the microcontrollers running Modbus/TCP and CoAP at the base of industrial control...

1 weeks ago cs.CR cs.AI PDF

Attack HIGH

CBV: Clean-label Backdoor Attacks on Vision Language Models via Diffusion Models

Ji Guo, Xiaolong Qin, Cencen Liu +3 more

Vision-Language Models (VLMs) have achieved remarkable success in tasks such as image captioning and visual question answering (VQA). However, as...

1 weeks ago cs.AI PDF

Attack HIGH

When Alignment Isn't Enough: Response-Path Attacks on LLM Agents

Mingyu Luo, Zihan Zhang, Zesen Liu +7 more

Bring-Your-Own-Key (BYOK) agent architectures let users route LLM traffic through third-party relays, creating a critical integrity gap: a malicious...

1 weeks ago cs.CR cs.AI PDF

Attack HIGH

FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption

Yanting Wang, Chenlong Yin, Ying Chen +1 more

Long-context large language models (LLMs)-for example, Gemini-3.1-Pro and Qwen-3.5-are widely used to empower many real-world applications, such as...

1 weeks ago cs.CR PDF

Attack HIGH

Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection

Prashant Kulkarni

Multi-turn prompt injection follows a known attack path -- trust-building, pivoting, escalation but text-level defenses miss covert attacks where...

1 weeks ago cs.CR cs.AI PDF

Attack HIGH

TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning

Bowen Sun, Chaozhuo Li, Yaodong Yang +2 more

Decompositional jailbreaks pose a critical threat to large language models (LLMs) by allowing adversaries to fragment a malicious objective into a...

1 weeks ago cs.CR cs.CL cs.LG PDF

Survey HIGH

Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study

Luyao Xu, Xiang Chen

Autonomous agent frameworks built upon large language models (LLMs) are evolving into complex, tool-integrated, and continuously operating systems,...

1 weeks ago cs.CR cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial