AI Security Research

2,589+ academic papers on AI security, attacks, and defenses

Total

2,589

Attack

998

Benchmark

740

Defense

355

Tool

276

Survey

147

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1001–1020 of 1,930 papers

Clear filters

Tool MEDIUM

Autonomous Action Runtime Management(AARM):A System Specification for Securing AI-Driven Actions at Runtime

Herman Errico

As artificial intelligence systems evolve from passive assistants into autonomous agents capable of executing consequential actions, the security...

3 months ago cs.CR cs.AI PDF

Benchmark LOW

Reward Modeling for Reinforcement Learning-Based LLM Reasoning: Design, Challenges, and Evaluation

Pei-Chi Pan, Yingbin Liang, Sen Lin

Large Language Models (LLMs) demonstrate transformative potential, yet their reasoning remains inconsistent and unreliable. Reinforcement learning...

3 months ago cs.LG PDF

Benchmark HIGH

CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation

Chaeyun Kim, YongTaek Lim, Kihyun Kim +2 more

Existing red-teaming benchmarks, when adapted to new languages via direct translation, fail to capture socio-technical vulnerabilities rooted in...

3 months ago cs.CY cs.AI PDF

Attack HIGH

MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

Georgios Syros, Evan Rose, Brian Grinstead +4 more

Large language model (LLM) based web agents are increasingly deployed to automate complex online tasks by directly interacting with web sites and...

3 months ago cs.CR cs.AI PDF

Attack HIGH

One RNG to Rule Them All: How Randomness Becomes an Attack Vector in Machine Learning

Kotekar Annapoorna Prabhu, Andrew Gan, Zahra Ghodsi

Machine learning relies on randomness as a fundamental component in various steps such as data sampling, data augmentation, weight initialization,...

3 months ago cs.CR cs.LG PDF

Benchmark LOW

Basic Legibility Protocols Improve Trusted Monitoring

Ashwin Sreevatsa, Sebastian Prasanna, Cody Rushing

The AI Control research agenda aims to develop control protocols: safety techniques that prevent untrusted AI systems from taking harmful actions...

3 months ago cs.CR cs.LG cs.SE PDF

Benchmark MEDIUM

When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents

Yuting Ning, Jaylen Jones, Zhehao Zhang +5 more

Computer-use agents (CUAs) have made tremendous progress in the past year, yet they still frequently produce misaligned actions that deviate from the...

3 months ago cs.CL PDF

Attack HIGH

Red-teaming the Multimodal Reasoning: Jailbreaking Vision-Language Models via Cross-modal Entanglement Attacks

Yu Yan, Sheng Sun, Shengjia Cheng +3 more

Vision-Language Models (VLMs) with multimodal reasoning capabilities are high-value attack targets, given their potential for handling complex...

3 months ago cs.CR cs.AI PDF

Attack HIGH

StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors

Suraj Ranganath, Atharv Ramesh

AI-text detectors face a critical robustness challenge: adversarial paraphrasing attacks that preserve semantics while evading detection. We...

3 months ago cs.LG cs.AI cs.CR PDF

Attack HIGH

StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors

Suraj Ranganath, Atharv Ramesh

AI-text detectors face a critical robustness challenge: adversarial paraphrasing attacks that preserve semantics while evading detection. We...

3 months ago cs.LG cs.AI cs.CR PDF

Defense MEDIUM

Stress-Testing Alignment Audits With Prompt-Level Strategic Deception

Oliver Daniels, Perusha Moodley, Benjamin M. Marlin +1 more

Alignment audits aim to robustly identify hidden goals from strategic, situationally aware misaligned models. Despite this threat model, existing...

3 months ago cs.LG PDF

Defense MEDIUM

Is Reasoning Capability Enough for Safety in Long-Context Language Models?

Yu Fu, Haz Sameen Shahgir, Huanli Gong +3 more

Large language models (LLMs) increasingly combine long-context processing with advanced reasoning, enabling them to retrieve and synthesize...

3 months ago cs.CL cs.CR PDF

Attack HIGH

Large Language Lobotomy: Jailbreaking Mixture-of-Experts via Expert Silencing

Jona te Lintelo, Lichao Wu, Stjepan Picek

The rapid adoption of Mixture-of-Experts (MoE) architectures marks a major shift in the deployment of Large Language Models (LLMs). MoE LLMs improve...

3 months ago cs.CR PDF

Survey LOW

SoK: The Pitfalls of Deep Reinforcement Learning for Cybersecurity

Shae McFadden, Myles Foley, Elizabeth Bates +5 more

Deep Reinforcement Learning (DRL) has achieved remarkable success in domains requiring sequential decision-making, motivating its application to...

3 months ago cs.LG cs.CR PDF

Attack HIGH

Dashed Line Defense: Plug-And-Play Defense Against Adaptive Score-Based Query Attacks

Yanzhang Fu, Zizheng Guo, Jizhou Luo

Score-based query attacks pose a serious threat to deep learning models by crafting adversarial examples (AEs) using only black-box access to model...

3 months ago cs.LG cs.CR PDF

Attack HIGH

Retrieval Pivot Attacks in Hybrid RAG: Measuring and Mitigating Amplified Leakage from Vector Seeds to Graph Expansion

Scott Thornton

Hybrid Retrieval-Augmented Generation (RAG) pipelines combine vector similarity search with knowledge graph expansion for multi-hop reasoning. We...

3 months ago cs.CR cs.IR cs.LG PDF

Defense MEDIUM

Sparse Models, Sparse Safety: Unsafe Routes in Mixture-of-Experts LLMs

Yukun Jiang, Hai Huang, Mingjie Li +3 more

By introducing routers to selectively activate experts in Transformer layers, the mixture-of-experts (MoE) architecture significantly reduces...

3 months ago cs.LG cs.AI cs.CR PDF

Benchmark LOW

Stateless Yet Not Forgetful: Implicit Memory as a Hidden Channel in LLMs

Ahmed Salem, Andrew Paverd, Sahar Abdelnabi

Large language models (LLMs) are commonly treated as stateless: once an interaction ends, no information is assumed to persist unless it is...

3 months ago cs.LG cs.AI cs.CR PDF

Benchmark MEDIUM

When Evaluation Becomes a Side Channel: Regime Leakage and Structural Mitigations for Alignment Assessment

Igor Santos-Grueiro

Safety evaluation for advanced AI systems assumes that behavior observed under evaluation predicts behavior in deployment. This assumption weakens...

3 months ago cs.AI cs.CR cs.LG PDF

Benchmark MEDIUM

RIFLE: Robust Distillation-based FL for Deep Model Deployment on Resource-Constrained IoT Networks

Pouria Arefijamal, Mahdi Ahmadlou, Bardia Safaei +1 more

Federated learning (FL) is a decentralized learning paradigm widely adopted in resource-constrained Internet of Things (IoT) environments. These...

3 months ago cs.LG cs.CR cs.DC PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial