AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 21–40 of 1,909 papers

Clear filters

Attack HIGH

Usability as a Weapon: Attacking the Safety of LLM-Based Code Generation via Usability Requirements

Yue Li, Xiao Li, Hao Wu +5 more

Large Language Models (LLMs) are increasingly used for automated software development, making their ability to preserve secure coding practices...

Yesterday cs.CR cs.SE PDF

Attack HIGH

Metis: Learning to Jailbreak LLMs via Self-Evolving Metacognitive Policy Optimization

Huilin Zhou, Jian Zhao, Yilu Zhong +7 more

Red teaming is critical for uncovering vulnerabilities in Large Language Models (LLMs). While automated methods have improved scalability, existing...

Yesterday cs.LG cs.AI PDF

Benchmark MEDIUM

The Cartesian Shortcut: Re-evaluate Vision Reasoning in Polar Coordinate Space

Xia Hu, Zhenrui Yue, Brian Potetz +4 more

As current Multimodal Large Language Models rapidly saturate canonical visual reasoning benchmarks, a key question emerges: do these strong scores...

Yesterday cs.CV cs.AI PDF

Attack MEDIUM

Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning

Ben Kereopa-Yorke, Guillermo Diaz, Holly Wright +3 more

We define Oracle Poisoning, an attack class in which an adversary corrupts a structured knowledge graph that AI agents query at runtime via tool-use...

2 days ago cs.CR cs.AI PDF

Attack MEDIUM

CALYREX: Cross-Attention LaYeR EXtended Transformers for System Prompt Anchoring

Li Lixing

Modern large language models (LLMs) rely on system prompts to establish behavioral constraints and safety rules. Standard causal self-attention...

2 days ago cs.LG PDF

Survey HIGH

MonitoringBench: Semi-Automated Red-Teaming for Agent Monitoring

Monika Jotautaitė, Maria Angelica Martinez, Ollie Matthews +1 more

We introduce a red-teaming methodology that exposes harder-to-catch attacks for coding-agent monitors, suggesting that current practices may...

2 days ago cs.CR cs.AI PDF

Benchmark MEDIUM

MedMeta: A Benchmark for LLMs in Synthesizing Meta-Analysis Conclusion from Medical Studies

Huy Hoang Ha, Benoit Favre, Francois Portet

Large language models (LLMs) have saturated standard medical benchmarks that test factual recall, yet their ability to perform higher-order...

2 days ago cs.CL cs.AI PDF

Benchmark MEDIUM

Modeling Implicit Conflict Monitoring Mechanisms against Stereotypes in LLMs

Jingshen Zhang, Bo Wang, Yanlin Fu +4 more

In this paper, we study an emergent self-debiasing mechanisms against stereotypical content in Large Language Models (LLMs). Unlike traditional...

2 days ago cs.SI PDF

Attack HIGH

Trust Me, Import This: Dependency Steering Attacks via Malicious Agent Skills

Yiyong Liu, Chia-Yi Hsu, Chun-Ying Huang +3 more

LLM-powered coding agents increasingly make software supply chain decisions. They generate imports, recommend packages, and write installation...

2 days ago cs.CR PDF

Tool MEDIUM

Position: AI Security Policy Should Target Systems, Not Models

Michael A. Riegler, Inga Strümke

We present swarm-attack, an open-source adversarial testing framework in which multiple lightweight LLM agents coordinate through shared memory,...

2 days ago cs.CR cs.AI cs.LG PDF

Benchmark MEDIUM

Don't Click That: Teaching Web Agents to Resist Deceptive Interfaces

Yilin Zhang, Yingkai Hua, Chunyu Wei +2 more

Vision-language model (VLM) based web agents demonstrate impressive autonomous GUI interaction but remain vulnerable to deceptive interface elements....

2 days ago cs.AI cs.CR PDF

Defense HIGH

VulTriage: Triple-Path Context Augmentation for LLM-Based Vulnerability Detection

Wenxin Tang, Xiang Zhang, Junliang Liu +11 more

Automated vulnerability detection is a fundamental task in software security, yet existing learning-based methods still struggle to capture the...

2 days ago cs.AI PDF

Benchmark HIGH

How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation

Shai Feldman, Yaniv Romano

Evaluating and predicting the performance of large language models (LLMs) in multi-turn conversational settings is critical yet computationally...

5 days ago cs.LG PDF

Attack MEDIUM

Patch2Vuln: Agentic Reconstruction of Vulnerabilities from Linux Distribution Binary Patches

Isaac David, Arthur Gervais

Security updates create a short but important window in which defenders and attackers can compare vulnerable and patched software. Yet in many...

5 days ago cs.CR cs.AI PDF

Benchmark HIGH

Autonomous Adversary: Red-Teaming in the age of LLM

Mohammad Mamun, Mohamed Gaber, Scott Buffett +1 more

Language Model Agents (LMAs) are emerging as a powerful primitive for augmenting red-team operations. They can support attack planning, adversary...

5 days ago cs.CR PDF

Other LOW

Towards Emotion Consistency Analysis of Large Language Models in Emotional Conversational Contexts

Sneha Oram, Ojaswita Bhushan, Pushpak Bhattacharyya

In this work, we conduct an analysis to examine the consistency of Large Language Models (LLMs) with respect to their own generated responses in an...

5 days ago cs.CL PDF

Attack HIGH

Pop Quiz Attack: Black-box Membership Inference Attacks Against Large Language Models

Zeyuan Chen, Yihan Ma, Xinyue Shen +2 more

Large language models (LLMs) show strong performance across many applications, but their ability to memorize and potentially reveal training data...

5 days ago cs.CR PDF

Benchmark MEDIUM

Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation

Di Lu, Bo Zhang, Xiyuan Li +5 more

Self-hosted computer-use agents (SHCUAs), such as OpenClaw, combine natural-language interaction with direct access to host-side resources, including...

5 days ago cs.CR PDF

Defense LOW

Automated alignment is harder than you think

Aleksandr Bowkis, Marie Davidsen Buhl, Jacob Pfau +1 more

A leading proposal for aligning artificial superintelligence (ASI) is to use AI agents to automate an increasing fraction of alignment research as...

5 days ago cs.AI PDF

Tool MEDIUM

Correct Code, Vulnerable Dependencies: A Large Scale Measurement Study of LLM-Specified Library Versions

Chengjie Wang, Jingzheng Wu, Xiang Ling +2 more

Large language models (LLMs) are now largely involved in software development workflows, and the code they generate routinely includes third-party...

5 days ago cs.SE cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial