AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1–20 of 90 papers

Clear filters

Benchmark HIGH

LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments

Chiyu Zhang, Huiqin Yang, Bendong Jiang +8 more

The rapid proliferation of LLM-based autonomous agents in real operating system environments introduces a new category of safety risk beyond content...

Yesterday cs.CR cs.CL PDF

Benchmark HIGH

How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation

Shai Feldman, Yaniv Romano

Evaluating and predicting the performance of large language models (LLMs) in multi-turn conversational settings is critical yet computationally...

5 days ago cs.LG PDF

Benchmark HIGH

Autonomous Adversary: Red-Teaming in the age of LLM

Mohammad Mamun, Mohamed Gaber, Scott Buffett +1 more

Language Model Agents (LMAs) are emerging as a powerful primitive for augmenting red-team operations. They can support attack planning, adversary...

5 days ago cs.CR PDF

Benchmark HIGH

Evaluation of Prompt Injection Defenses in Large Language Models

Priyal Deep, Shane Emmons, Amy Fox +3 more

LLM-powered applications routinely embed secrets in system prompts, yet models can be tricked into revealing them. We built an adaptive attacker that...

2 weeks ago cs.CR cs.AI PDF

Benchmark HIGH

Synthesizing Multi-Agent Harnesses for Vulnerability Discovery

Hanzhi Liu, Chaofan Shou, Xiaonan Liu +4 more

LLM agents have begun to find real security vulnerabilities that human auditors and automated fuzzers missed for decades, in source-available targets...

2 weeks ago cs.CR PDF

Benchmark HIGH

HarDBench: A Benchmark for Draft-Based Co-Authoring Jailbreak Attacks for Safe Human-LLM Collaborative Writing

Euntae Kim, Soomin Han, Buru Chang

Large language models (LLMs) are increasingly used as co-authors in collaborative writing, where users begin with rough drafts and rely on LLMs to...

3 weeks ago cs.CL PDF

Benchmark HIGH

RAVEN: Retrieval-Augmented Vulnerability Exploration Network for Memory Corruption Analysis in User Code and Binary Programs

Parteek Jamwal, Minghao Shao, Boyuan Chen +15 more

Large Language Models (LLMs) have demonstrated remarkable capabilities across various cybersecurity tasks, including vulnerability classification,...

3 weeks ago cs.CR cs.AI cs.MA PDF

Benchmark HIGH

Terminal Wrench: A Dataset of 331 Reward-Hackable Environments and 3,632 Exploit Trajectories

Ivan Bercovich, Ivgeni Segal, Kexun Zhang +3 more

We release Terminal Wrench, a subset of 331 terminal-agent benchmark environments, copied from the popular open benchmarks that are demonstrably...

3 weeks ago cs.CR cs.AI PDF

Benchmark HIGH

PIArena: A Platform for Prompt Injection Evaluation

Runpeng Geng, Chenlong Yin, Yanting Wang +2 more

Prompt injection attacks pose serious security risks across a wide range of real-world applications. While receiving increasing attention, the...

1 months ago cs.CR cs.AI cs.CL PDF

Benchmark HIGH

PoC-Adapt: Semantic-Aware Automated Vulnerability Reproduction with LLM Multi-Agents and Reinforcement Learning-Driven Adaptive Policy

Phan The Duy, Nguyen Viet Duy, Khoa Ngo-Khanh +2 more

While recent approaches leverage large language models (LLMs) and multi-agent pipelines to automatically generate proof-of-concept (PoC) exploits...

1 months ago cs.CR PDF

Benchmark HIGH

Uncovering Linguistic Fragility in Vision-Language-Action Models via Diversity-Aware Red Teaming

Baoshun Tong, Haoran He, Ling Pan +2 more

Vision-Language-Action (VLA) models have achieved remarkable success in robotic manipulation. However, their robustness to linguistic nuances remains...

1 months ago cs.RO cs.CV PDF

Benchmark HIGH

AEGIS: From Clues to Verdicts -- Graph-Guided Deep Vulnerability Reasoning via Dialectics and Meta-Auditing

Sen Fang, Weiyuan Ding, Zhezhen Cao +2 more

Large Language Models (LLMs) are increasingly adopted for vulnerability detection, yet their reasoning remains fundamentally unsound. We identify a...

1 months ago cs.SE cs.AI cs.CR PDF

Benchmark HIGH

Machine Learning for Network Attacks Classification and Statistical Evaluation of Machine Learning for Network Attacks Classification and Adversarial Learning Methodologies for Synthetic Data Generation

Iakovos-Christos Zarkadis, Christos Douligeris

Supervised detection of network attacks has always been a critical part of network intrusion detection systems (NIDS). Nowadays, in a pivotal time...

1 months ago cs.CR cs.AI stat.AP PDF

Benchmark HIGH

When Scanners Lie: Evaluator Instability in LLM Red-Teaming

Lidor Erez, Omer Hofman, Tamir Nizri +1 more

Automated LLM vulnerability scanners are increasingly used to assess security risks by measuring different attack type success rates (ASR). Yet the...

1 months ago cs.CR cs.PF PDF

Benchmark HIGH

Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies

Siddharth Srikanth, Freddie Liang, Sophie Hsu +9 more

Vision-Language-Action (VLA) models have significant potential to enable general-purpose robotic systems for a range of vision-language tasks....

2 months ago cs.RO cs.AI cs.CL PDF

Benchmark HIGH

Patch Validation in Automated Vulnerability Repair

Zheng Yu, Wenxuan Shi, Xinqian Sun +3 more

Automated Vulnerability Repair (AVR) systems, especially those leveraging large language models (LLMs), have demonstrated promising results in...

2 months ago cs.SE PDF

Benchmark HIGH

Patch Validation in Automated Vulnerability Repair

Zheng Yu, Wenxuan Shi, Xinqian Sun +3 more

Automated Vulnerability Repair (AVR) systems, especially those leveraging large language models (LLMs), have demonstrated promising results in...

2 months ago cs.SE PDF

Benchmark HIGH

JailNewsBench: Multi-Lingual and Regional Benchmark for Fake News Generation under Jailbreak Attacks

Masahiro Kaneko, Ayana Niwa, Timothy Baldwin

Fake news undermines societal trust and decision-making across politics, economics, health, and international relations, and in extreme cases...

2 months ago cs.LG cs.CL PDF

Benchmark HIGH

vEcho: A Paradigm Shift from Vulnerability Verification to Proactive Discovery with Large Language Models

Mingcheng Jiang, Jiancheng Huang, Jiangfei Wang +5 more

Static Application Security Testing (SAST) tools often suffer from high false positive rates, leading to alert fatigue that consumes valuable...

2 months ago cs.CR PDF

Benchmark HIGH

Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking

Zhicheng Fang, Jingjie Zheng, Chenxu Fu +1 more

Jailbreak techniques for large language models (LLMs) evolve faster than benchmarks, making robustness estimates stale and difficult to compare...

2 months ago cs.CR cs.AI cs.CL PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial