AI Security Research

2,589+ academic papers on AI security, attacks, and defenses

Total

2,589

Attack

998

Benchmark

740

Defense

355

Tool

276

Survey

147

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1341–1360 of 1,931 papers

Clear filters

Benchmark MEDIUM

Gradient Structure Estimation under Label-Only Oracles via Spectral Sensitivity

Jun Liu, Leo Yu Zhang, Fengpeng Li +2 more

Hard-label black-box settings, where only top-1 predicted labels are observable, pose a fundamentally constrained yet practically important feedback...

3 months ago cs.LG cs.CR PDF

Benchmark MEDIUM

Gradient Structure Estimation under Label-Only Oracles via Spectral Sensitivity

Jun Liu, Leo Yu Zhang, Fengpeng Li +2 more

Hard-label black-box settings, where only top-1 predicted labels are observable, pose a fundamentally constrained yet practically important feedback...

3 months ago cs.LG cs.CR PDF

Attack MEDIUM

Building Production-Ready Probes For Gemini

János Kramár, Joshua Engels, Zheng Wang +4 more

Frontier language model capabilities are improving rapidly. We thus need stronger mitigations against bad actors misusing increasingly powerful...

3 months ago cs.LG cs.AI cs.CL PDF

Benchmark LOW

LLM-Assisted Pseudo-Relevance Feedback

David Otero, Javier Parapar

Query expansion is a long-standing technique to mitigate vocabulary mismatch in ad hoc Information Retrieval. Pseudo-relevance feedback methods, such...

3 months ago cs.IR PDF

Attack MEDIUM

LoRA as Oracle

Marco Arazzi, Antonino Nocera

Backdoored and privacy-leaking deep neural networks pose a serious threat to the deployment of machine learning systems in security-critical...

3 months ago cs.CR cs.AI PDF

Attack HIGH

SD-RAG: A Prompt-Injection-Resilient Framework for Selective Disclosure in Retrieval-Augmented Generation

Aiman Al Masoud, Marco Arazzi, Antonino Nocera

Retrieval-Augmented Generation (RAG) has attracted significant attention due to its ability to combine the generative capabilities of Large Language...

3 months ago cs.CR cs.AI PDF

Benchmark LOW

CoG: Controllable Graph Reasoning via Relational Blueprints and Failure-Aware Refinement over Knowledge Graphs

Yuanxiang Liu, Songze Li, Xiaoke Guo +4 more

Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities but often grapple with reliability challenges like hallucinations....

4 months ago cs.CL cs.LG PDF

Attack HIGH

AJAR: Adaptive Jailbreak Architecture for Red-teaming

Yipu Dou, Wang Yang

Large language model (LLM) safety evaluation is moving from content moderation to action security as modern systems gain persistent state, tool...

4 months ago cs.CR cs.CL PDF

Tool MEDIUM

Beyond Max Tokens: Stealthy Resource Amplification via Tool Calling Chains in LLM Agents

Kaiyu Zhou, Yongsen Zheng, Yicheng He +5 more

The agent--tool interaction loop is a critical attack surface for modern Large Language Model (LLM) agents. Existing denial-of-service (DoS) attacks...

4 months ago cs.CR cs.AI PDF

Benchmark HIGH

Hidden-in-Plain-Text: A Benchmark for Social-Web Indirect Prompt Injection in RAG

Haoze Guo, Ziqi Wei

Retrieval-augmented generation (RAG) systems put more and more emphasis on grounding their responses in user-generated content found on the Web,...

4 months ago cs.CR cs.HC PDF

Attack HIGH

Serverless AI Security: Attack Surface Analysis and Runtime Protection Mechanisms for FaaS-Based Machine Learning

Chetan Pathade, Vinod Dhimam, Sheheryar Ahmad +1 more

Serverless computing has achieved widespread adoption, with over 70% of AWS organizations using serverless solutions [1]. Meanwhile, machine learning...

4 months ago cs.CR cs.AI PDF

Defense HIGH

Multi-Agent Taint Specification Extraction for Vulnerability Detection

Jonah Ghebremichael, Saastha Vasan, Saad Ullah +6 more

Static Application Security Testing (SAST) tools using taint analysis are widely viewed as providing higher-quality vulnerability detection results...

4 months ago cs.CR cs.SE PDF

Tool MEDIUM

SecMLOps: A Comprehensive Framework for Integrating Security Throughout the MLOps Lifecycle

Xinrui Zhang, Pincan Zhao, Jason Jaskolka +2 more

Machine Learning (ML) has emerged as a pivotal technology in the operation of large and complex systems, driving advancements in fields such as...

4 months ago cs.CR cs.SE PDF

Tool LOW

Institutional AI: A Governance Framework for Distributional AGI Safety

Federico Pierucci, Marcello Galisai, Marcantonio Syrnikov Bracale +6 more

As LLM-based systems increasingly operate as agents embedded within human social and technical systems, alignment can no longer be treated as a...

4 months ago cs.CY PDF

Defense HIGH

Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay

Hao Wang, Yanting Wang, Hao Li +2 more

Large Language Models (LLMs) have achieved remarkable capabilities but remain vulnerable to adversarial ``jailbreak'' attacks designed to bypass...

4 months ago cs.CR cs.CL PDF

Attack HIGH

Defending Large Language Models Against Jailbreak Attacks via In-Decoding Safety-Awareness Probing

Yinzhi Zhao, Ming Wang, Shi Feng +3 more

Large language models (LLMs) have achieved impressive performance across natural language tasks and are increasingly deployed in real-world...

4 months ago cs.AI cs.CL PDF

Defense LOW

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

Xingjun Ma, Yixu Wang, Hengyuan Xu +18 more

The rapid evolution of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has driven major gains in reasoning, perception, and...

4 months ago cs.AI cs.CL cs.CV PDF

Attack MEDIUM

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

Christina Lu, Jack Gallagher, Jonathan Michala +2 more

Large language models can represent a variety of personas but typically default to a helpful Assistant identity cultivated during post-training. We...

4 months ago cs.CL PDF

Survey MEDIUM

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

Yi Liu, Weizhe Wang, Ruitao Feng +5 more

The rise of AI agent frameworks has introduced agent skills, modular packages containing instructions and executable code that dynamically extend...

4 months ago cs.CR cs.AI cs.CL PDF

Attack MEDIUM

The Straight and Narrow: Do LLMs Possess an Internal Moral Path?

Luoming Hu, Jingjie Zeng, Liang Yang +1 more

Enhancing the moral alignment of Large Language Models (LLMs) is a critical challenge in AI safety. Current alignment techniques often act as...

4 months ago cs.CL PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial