AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1–20 of 152 papers

Clear filters

Tool MEDIUM

RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems

Joel Rorseth, Parke Godfrey, Lukasz Golab +2 more

This paper demonstrates RUBEN, an interactive tool for discovering minimal rules to explain the outputs of retrieval-augmented large language models...

Yesterday cs.CL PDF

Benchmark MEDIUM

From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World

Pedro Conde, Henrique Branquinho, Valerio Mazzone +3 more

AI pentesting agents are increasingly credible as offensive security systems, but current benchmarks still provide limited guidance on which will...

Yesterday cs.AI cs.CR PDF

Benchmark MEDIUM

Threat Modelling using Domain-Adapted Language Models: Empirical Evaluation and Insights

Saba Pourhanifeh, AbdulAziz AbdulGhaffar, Ashraf Matrawy

Large Language Models(LLMs) are increasingly explored for cybersecurity applications such as vulnerability detection. In the domain of threat...

Yesterday cs.CR cs.AI PDF

Survey MEDIUM

LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges

Johann Knechtel, Ozgur Sinanoglu, Ramesh Karri

The integration of Large Language Models (LLMs) into Electronic Design Automation (EDA) and hardware security is rapidly reshaping the semiconductor...

Yesterday cs.CR cs.AR cs.LG PDF

Defense MEDIUM

Intrinsic Guardrails: How Semantic Geometry of Personality Interacts with Emergent Misalignment in LLMs

Krishak Aneja, Manas Mittal, Anmol Goel +2 more

Fine-tuning Large Language Models (LLMs) on benign narrow data can sometimes induce broad harmful behaviors, a vulnerability termed emergent...

Yesterday cs.CL cs.AI PDF

Benchmark MEDIUM

Benchmarking Safety Risks of Knowledge-Intensive Reasoning under Malicious Knowledge Editing

Qinghua Mao, Xi Lin, Jinze Gu +3 more

Large language models (LLMs) increasingly rely on knowledge editing to support knowledge-intensive reasoning, but this flexibility also introduces...

Yesterday cs.AI cs.CR PDF

Benchmark MEDIUM

The Cartesian Shortcut: Re-evaluate Vision Reasoning in Polar Coordinate Space

Xia Hu, Zhenrui Yue, Brian Potetz +4 more

As current Multimodal Large Language Models rapidly saturate canonical visual reasoning benchmarks, a key question emerges: do these strong scores...

Yesterday cs.CV cs.AI PDF

Attack MEDIUM

Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning

Ben Kereopa-Yorke, Guillermo Diaz, Holly Wright +3 more

We define Oracle Poisoning, an attack class in which an adversary corrupts a structured knowledge graph that AI agents query at runtime via tool-use...

2 days ago cs.CR cs.AI PDF

Attack MEDIUM

CALYREX: Cross-Attention LaYeR EXtended Transformers for System Prompt Anchoring

Li Lixing

Modern large language models (LLMs) rely on system prompts to establish behavioral constraints and safety rules. Standard causal self-attention...

2 days ago cs.LG PDF

Benchmark MEDIUM

MedMeta: A Benchmark for LLMs in Synthesizing Meta-Analysis Conclusion from Medical Studies

Huy Hoang Ha, Benoit Favre, Francois Portet

Large language models (LLMs) have saturated standard medical benchmarks that test factual recall, yet their ability to perform higher-order...

2 days ago cs.CL cs.AI PDF

Benchmark MEDIUM

Modeling Implicit Conflict Monitoring Mechanisms against Stereotypes in LLMs

Jingshen Zhang, Bo Wang, Yanlin Fu +4 more

In this paper, we study an emergent self-debiasing mechanisms against stereotypical content in Large Language Models (LLMs). Unlike traditional...

2 days ago cs.SI PDF

Tool MEDIUM

Position: AI Security Policy Should Target Systems, Not Models

Michael A. Riegler, Inga Strümke

We present swarm-attack, an open-source adversarial testing framework in which multiple lightweight LLM agents coordinate through shared memory,...

2 days ago cs.CR cs.AI cs.LG PDF

Benchmark MEDIUM

Don't Click That: Teaching Web Agents to Resist Deceptive Interfaces

Yilin Zhang, Yingkai Hua, Chunyu Wei +2 more

Vision-language model (VLM) based web agents demonstrate impressive autonomous GUI interaction but remain vulnerable to deceptive interface elements....

2 days ago cs.AI cs.CR PDF

Attack MEDIUM

Patch2Vuln: Agentic Reconstruction of Vulnerabilities from Linux Distribution Binary Patches

Isaac David, Arthur Gervais

Security updates create a short but important window in which defenders and attackers can compare vulnerable and patched software. Yet in many...

5 days ago cs.CR cs.AI PDF

Benchmark MEDIUM

Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation

Di Lu, Bo Zhang, Xiyuan Li +5 more

Self-hosted computer-use agents (SHCUAs), such as OpenClaw, combine natural-language interaction with direct access to host-side resources, including...

5 days ago cs.CR PDF

Tool MEDIUM

Correct Code, Vulnerable Dependencies: A Large Scale Measurement Study of LLM-Specified Library Versions

Chengjie Wang, Jingzheng Wu, Xiang Ling +2 more

Large language models (LLMs) are now largely involved in software development workflows, and the code they generate routinely includes third-party...

5 days ago cs.SE cs.AI PDF

Defense MEDIUM

ClawGuard: Out-of-Band Detection of LLM Agent Workflow Hijacking via EM Side Channel

Leo Linqian Gan, Jeffery Wu, Longyuan Ge +6 more

Autonomous LLM agents face a critical security risk known as workflow hijacking, where attackers subtly alter tool and skill invocations. Existing...

5 days ago cs.CR PDF

Defense MEDIUM

Safety Anchor: Defending Harmful Fine-tuning via Geometric Bottlenecks

Guoxin Lu, Letian Sha, Qing Wang +4 more

The safety alignment of Large Language Models (LLMs) remains vulnerable to Harmful Fine-tuning (HFT). While existing defenses impose constraints on...

5 days ago cs.CR cs.AI cs.CL PDF

Benchmark MEDIUM

PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts

Qinfeng Li, Yuntai Bao, Jianghui Hu +5 more

LLM agents rely on prompts to implement task-specific capabilities based on foundation LLMs, making agent prompts valuable intellectual property....

5 days ago cs.CR cs.AI PDF

Survey MEDIUM

Heimdallr: Characterizing and Detecting LLM-Induced Security Risks in GitHub CI Workflows

Bonan Ruan, Yeqi Fu, Chuqi Zhang +3 more

GitHub Continuous Integration (CI) workflows increasingly integrate Large Language Models (LLMs) to automate review, triage, content generation, and...

5 days ago cs.CR cs.SE PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial