AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 61–80 of 867 papers

Clear filters

Attack HIGH

Toward Efficient Membership Inference Attacks against Federated Large Language Models: A Projection Residual Approach

Guilin Deng, Silong Chen, Yuchuan Luo +6 more

Federated Large Language Models (FedLLMs) enable multiple parties to collaboratively fine-tune LLMs without sharing raw data, addressing challenges...

2 weeks ago cs.LG PDF

Attack HIGH

Adaptive Instruction Composition for Automated LLM Red-Teaming

Jesse Zymet, Andy Luo, Swapnil Shinde +2 more

Many approaches to LLM red-teaming leverage an attacker LLM to discover jailbreaks against a target. Several of them task the attacker with...

2 weeks ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models

Yannis Belkhiter, Giulio Zizzo, Sergio Maffeis +2 more

The growth of agentic AI has drawn significant attention to function calling Large Language Models (LLMs), which are designed to extend the...

2 weeks ago cs.CR cs.AI cs.CL PDF

Benchmark HIGH

Synthesizing Multi-Agent Harnesses for Vulnerability Discovery

Hanzhi Liu, Chaofan Shou, Xiaonan Liu +4 more

LLM agents have begun to find real security vulnerabilities that human auditors and automated fuzzers missed for decades, in source-available targets...

2 weeks ago cs.CR PDF

Attack HIGH

Towards Certified Malware Detection: Provable Guarantees Against Evasion Attacks

Nandakrishna Giri, Asmitha K. A., Serena Nicolazzo +2 more

Machine learning-based static malware detectors remain vulnerable to adversarial evasion techniques, such as metamorphic engine mutations. To address...

2 weeks ago cs.CR cs.LG PDF

Attack HIGH

Adaptive Defense Orchestration for RAG: A Sentinel-Strategist Architecture against Multi-Vector Attacks

Pranav Pallerla, Wilson Naik Bhukya, Bharath Vemula +1 more

Retrieval-augmented generation (RAG) systems are increasingly deployed in sensitive domains such as healthcare and law, where they rely on private,...

2 weeks ago cs.CR cs.AI PDF

Defense HIGH

Taint-Style Vulnerability Detection and Confirmation for Node.js Packages Using LLM Agent Reasoning

Ronghao Ni, Mihai Christodorescu, Limin Jia

The rapidly evolving Node$.$js ecosystem currently includes millions of packages and is a critical part of modern software supply chains, making...

2 weeks ago cs.CR cs.AI cs.SE PDF

Tool HIGH

If you're waiting for a sign... that might not be it! Mitigating Trust Boundary Confusion from Visual Injections on Vision-Language Agentic Systems

Jiamin Chang, Minhui Xue, Ruoxi Sun +3 more

Recent advances in embodied Vision-Language Agentic Systems (VLAS), powered by large vision-language models (LVLMs), enable AI systems to perceive...

3 weeks ago cs.CV cs.AI PDF

Benchmark HIGH

HarDBench: A Benchmark for Draft-Based Co-Authoring Jailbreak Attacks for Safe Human-LLM Collaborative Writing

Euntae Kim, Soomin Han, Buru Chang

Large language models (LLMs) are increasingly used as co-authors in collaborative writing, where users begin with rough drafts and rely on LLMs to...

3 weeks ago cs.CL PDF

Attack HIGH

STAR-Teaming: A Strategy-Response Multiplex Network Approach to Automated LLM Red Teaming

MinJae Jung, YongTaek Lim, Chaeyun Kim +3 more

While Large Language Models (LLMs) are widely used, they remain susceptible to jailbreak prompts that can elicit harmful or inappropriate responses....

3 weeks ago cs.CL PDF

Tool HIGH

ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System

Jiacheng Liang, Yao Ma, Tharindu Kumarage +5 more

Reinforcement Learning from Human Feedback (RLHF) is central to aligning Large Language Models (LLMs), yet it introduces a critical vulnerability: an...

3 weeks ago cs.AI cs.CR cs.LG PDF

Attack HIGH

An Empirical Study of Multi-Generation Sampling for Jailbreak Detection in Large Language Models

Hanrui Luo, Shreyank N Gowda

Detecting jailbreak behaviour in large language models remains challenging, particularly when strongly aligned models produce harmful outputs only...

3 weeks ago cs.CL cs.LG PDF

Attack HIGH

Different Paths to Harmful Compliance: Behavioral Side Effects and Mechanistic Divergence Across LLM Jailbreaks

Md Rysul Kabir, Zoran Tiganj

Open-weight language models can be rendered unsafe through several distinct interventions, but the resulting models may differ substantially in...

3 weeks ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Beyond Pattern Matching: Seven Cross-Domain Techniques for Prompt Injection Detection

Thamilvendhan Munirathinam

Current open-source prompt-injection detectors converge on two architectural choices: regular-expression pattern matching and fine-tuned transformer...

3 weeks ago cs.CR cs.CL PDF

Attack HIGH

Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation

Wentao Zhang, Yan Zhuang, ZhuHang Zheng +3 more

Existing jamming attacks on Retrieval-Augmented Generation (RAG) systems typically induce explicit refusals or denial-of-service behaviors, which are...

3 weeks ago cs.CR cs.AI PDF

Attack HIGH

Evaluating Answer Leakage Robustness of LLM Tutors against Adversarial Student Attacks

Jin Zhao, Marta Knežević, Tanja Käser

Large Language Models (LLMs) are increasingly used in education, yet their default helpfulness often conflicts with pedagogical principles. Prior...

3 weeks ago cs.CR cs.AI PDF

Benchmark HIGH

RAVEN: Retrieval-Augmented Vulnerability Exploration Network for Memory Corruption Analysis in User Code and Binary Programs

Parteek Jamwal, Minghao Shao, Boyuan Chen +15 more

Large Language Models (LLMs) have demonstrated remarkable capabilities across various cybersecurity tasks, including vulnerability classification,...

3 weeks ago cs.CR cs.AI cs.MA PDF

Benchmark HIGH

Terminal Wrench: A Dataset of 331 Reward-Hackable Environments and 3,632 Exploit Trajectories

Ivan Bercovich, Ivgeni Segal, Kexun Zhang +3 more

We release Terminal Wrench, a subset of 331 terminal-agent benchmark environments, copied from the popular open benchmarks that are demonstrably...

3 weeks ago cs.CR cs.AI PDF

Attack HIGH

Route to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix Optimization

Haochun Tang, Yuliang Yan, Jiahua Lu +2 more

Cost-aware routing dynamically dispatches user queries to models of varying capability to balance performance and inference cost. However, the...

3 weeks ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection

Meng Chen, Kun Wang, Li Lu +2 more

Modern Large audio-language models (LALMs) power intelligent voice interactions by tightly integrating audio and text. This integration, however,...

3 weeks ago cs.CR cs.AI cs.SD PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial