AI Security Research

2,583+ academic papers on AI security, attacks, and defenses

Total

2,583

Attack

994

Benchmark

740

Defense

355

Tool

275

Survey

146

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 681–700 of 1,228 papers

Clear filters

Survey MEDIUM

A Survey of Agentic AI and Cybersecurity: Challenges, Opportunities and Use-case Prototypes

Sahaya Jestus Lazer, Kshitiz Aryal, Maanak Gupta +1 more

Agentic AI marks an important transition from single-step generative models to systems capable of reasoning, planning, acting, and adapting over...

4 months ago cs.CR cs.AI PDF

Attack MEDIUM

Merging Triggers, Breaking Backdoors: Defensive Poisoning for Instruction-Tuned Language Models

San Kim, Gary Geunbae Lee

Large Language Models (LLMs) have greatly advanced Natural Language Processing (NLP), particularly through instruction tuning, which enables broad...

4 months ago cs.CL cs.AI PDF

Survey MEDIUM

SoK: Privacy Risks and Mitigations in Retrieval-Augmented Generation Systems

Andreea-Elena Bodea, Stephen Meisenbacher, Alexandra Klymenko +1 more

The continued promise of Large Language Models (LLMs), particularly in their natural language understanding and generation capabilities, has driven a...

4 months ago cs.CR cs.CL PDF

Benchmark MEDIUM

From Domains to Instances: Dual-Granularity Data Synthesis for LLM Unlearning

Xiaoyu Xu, Minxin Du, Zitong Li +6 more

Although machine unlearning is essential for removing private, harmful, or copyrighted content from LLMs, current benchmarks often fail to faithfully...

4 months ago cs.CL cs.AI cs.CR PDF

Defense MEDIUM

What Matters For Safety Alignment?

Xing Li, Hui-Ling Zhen, Lihao Yin +3 more

This paper presents a comprehensive empirical study on the safety alignment capabilities. We evaluate what matters for safety alignment in LLMs and...

4 months ago cs.CL cs.AI cs.CR PDF

Benchmark MEDIUM

Shadow Unlearning: A Neuro-Semantic Approach to Fidelity-Preserving Faceless Forgetting in LLMs

Dinesh Srivasthav P, Ashok Urlana, Rahul Mishra +2 more

Machine unlearning aims to selectively remove the influence of specific training samples to satisfy privacy regulations such as the GDPR's 'Right to...

4 months ago cs.CR cs.AI cs.CL PDF

Defense MEDIUM

STAR-S: Improving Safety Alignment through Self-Taught Reasoning on Safety Rules

Di Wu, Yanyan Zhao, Xin Lu +2 more

Defending against jailbreak attacks is crucial for the safe deployment of Large Language Models (LLMs). Recent research has attempted to improve...

4 months ago cs.AI cs.CL PDF

Tool MEDIUM

Prompt-Counterfactual Explanations for Generative AI System Behavior

Sofie Goethals, Foster Provost, João Sedoc

As generative AI systems become integrated into real-world applications, organizations increasingly need to be able to understand and interpret their...

4 months ago cs.LG cs.AI cs.CL PDF

Attack MEDIUM

Enhancing Moral Diagnosis and Correction in Large Language Models

Bocheng Chen, Xi Chen, Han Zi +5 more

Identifying specific moral errors in an input and generating appropriate corrections require moral sensitivity in large language models (LLMs), which...

4 months ago cs.CL PDF

Survey MEDIUM

Autonomous Threat Detection and Response in Cloud Security: A Comprehensive Survey of AI-Driven Strategies

Gaurav Sarraf, Vibhor Pal

Cloud computing has changed online communities in three dimensions, which are scalability, adaptability and reduced overhead. But there are serious...

4 months ago cs.CR PDF

Attack MEDIUM

Extracting books from production language models

Ahmed Ahmed, A. Feder Cooper, Sanmi Koyejo +1 more

Many unresolved legal questions over LLMs and copyright center on memorization: whether specific training data have been encoded in the model's...

4 months ago cs.CL cs.AI cs.LG PDF

Tool MEDIUM

LAsset: An LLM-assisted Security Asset Identification Framework for System-on-Chip (SoC) Verification

Md Ajoad Hasan, Dipayan Saha, Khan Thamid Hasan +5 more

The growing complexity of modern system-on-chip (SoC) and IP designs is making security assurance difficult day by day. One of the fundamental steps...

4 months ago cs.CR cs.AI PDF

Attack MEDIUM

SWaRL: Safeguard Code Watermarking via Reinforcement Learning

Neusha Javidnia, Ruisi Zhang, Ashish Kundu +1 more

We present SWaRL, a robust and fidelity-preserving watermarking framework designed to protect the intellectual property of code LLM owners by...

4 months ago cs.CR cs.LG PDF

Defense MEDIUM

VocalBridge: Latent Diffusion-Bridge Purification for Defeating Perturbation-Based Voiceprint Defenses

Maryam Abbasihafshejani, AHM Nazmus Sakib, Murtuza Jadliwala

The rapid advancement of speech synthesis technologies, including text-to-speech (TTS) and voice conversion (VC), has intensified security and...

4 months ago cs.SD cs.AI cs.CR PDF

Benchmark MEDIUM

Exploring Approaches for Detecting Memorization of Recommender System Data in Large Language Models

Antonio Colacicco, Vito Guida, Dario Di Palma +2 more

Large Language Models (LLMs) are increasingly applied in recommendation scenarios due to their strong natural language understanding and generation...

4 months ago cs.IR cs.AI cs.CL PDF

Attack MEDIUM

Crafting Adversarial Inputs for Large Vision-Language Models Using Black-Box Optimization

Jiwei Guan, Haibo Jin, Haohan Wang

Recent advancements in Large Vision-Language Models (LVLMs) have shown groundbreaking capabilities across diverse multimodal tasks. However, these...

4 months ago cs.CR cs.AI cs.CV PDF

Benchmark MEDIUM

Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage

Jinwei Hu, Xinmiao Huang, Youcheng Sun +2 more

As large language models (LLMs) transition to autonomous agents synthesizing real-time information, their reasoning capabilities introduce an...

4 months ago cs.CL cs.AI cs.MA PDF

Tool MEDIUM

Exposing Hidden Interfaces: LLM-Guided Type Inference for Reverse Engineering macOS Private Frameworks

Arina Kharlamova, Youcheng Sun, Ting Yu

Private macOS frameworks underpin critical services and daemons but remain undocumented and distributed only as stripped binaries, complicating...

4 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

JMedEthicBench: A Multi-Turn Conversational Benchmark for Evaluating Medical Safety in Japanese Large Language Models

Junyu Liu, Zirui Li, Qian Niu +7 more

As Large Language Models (LLMs) are increasingly deployed in healthcare field, it becomes essential to carefully evaluate their medical safety before...

4 months ago cs.CL cs.AI PDF

Survey MEDIUM

Automated Post-Incident Policy Gap Analysis via Threat-Informed Evidence Mapping using Large Language Models

Huan Lin Oh, Jay Yong Jun Jie, Mandy Lee Ling Siu +1 more

Cybersecurity post-incident reviews are essential for identifying control failures and improving organisational resilience, yet they remain...

4 months ago cs.CR cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial