AI Security Research

2,583+ academic papers on AI security, attacks, and defenses

Total

2,583

Attack

994

Benchmark

740

Defense

355

Tool

275

Survey

146

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1801–1820 of 1,934 papers

Clear filters

Benchmark HIGH

Red Teaming Large Reasoning Models

Jiawei Chen, Yang Yang, Chao Yu +6 more

Large Reasoning Models (LRMs) have emerged as a powerful advancement in multi-step reasoning tasks, offering enhanced transparency and logical...

5 months ago cs.CR cs.AI PDF

Attack HIGH

WARP: Weight Teleportation for Attack-Resilient Unlearning Protocols

Mohammad M Maheri, Xavier Cadet, Peter Chin +1 more

Approximate machine unlearning aims to efficiently remove the influence of specific data points from a trained model, offering a practical...

5 months ago cs.LG cs.AI cs.CR PDF

Defense MEDIUM

SD-CGAN: Conditional Sinkhorn Divergence GAN for DDoS Anomaly Detection in IoT Networks

Henry Onyeka, Emmanuel Samson, Liang Hong +3 more

The increasing complexity of IoT edge networks presents significant challenges for anomaly detection, particularly in identifying sophisticated...

5 months ago cs.LG cs.CR PDF

Benchmark MEDIUM

Evaluating LLMs for One-Shot Patching of Real and Artificial Vulnerabilities

Aayush Garg, Zanis Ali Khan, Renzo Degiovanni +1 more

Automated vulnerability patching is crucial for software security, and recent advancements in Large Language Models (LLMs) present promising...

5 months ago cs.CR cs.AI cs.SE PDF

Defense MEDIUM

Are LLMs Good Safety Agents or a Propaganda Engine?

Neemesh Yadav, Francesco Ortu, Jiarui Liu +5 more

Large Language Models (LLMs) are trained to refuse to respond to harmful content. However, systematic analyses of whether this behavior is truly a...

5 months ago cs.CL PDF

Attack MEDIUM

An Empirical Study on the Security Vulnerabilities of GPTs

Tong Wu, Weibin Wu, Zibin Zheng

Equipped with various tools and knowledge, GPTs, one kind of customized AI agents based on OpenAI's large language models, have illustrated great...

5 months ago cs.CR cs.SE PDF

Defense HIGH

Retrieval-Augmented Few-Shot Prompting Versus Fine-Tuning for Code Vulnerability Detection

Fouad Trad, Ali Chehab

Few-shot prompting has emerged as a practical alternative to fine-tuning for leveraging the capabilities of large language models (LLMs) in...

5 months ago cs.SE cs.AI cs.CL PDF

Benchmark LOW

TIM-PRM: Verifying multimodal reasoning with Tool-Integrated PRM

Peng Kuang, Xiangxiang Wang, Wentao Liu +2 more

Multimodal Large Language Models (MLLMs) have achieved impressive performances in mathematical reasoning, yet they remain vulnerable to visual...

5 months ago cs.AI PDF

Tool MEDIUM

MAS-Shield: A Defense Framework for Secure and Efficient LLM MAS

Kaixiang Wang, Zhaojiacheng Zhou, Bunyod Suvonov +2 more

Large Language Model (LLM)-based Multi-Agent Systems (MAS) are susceptible to linguistic attacks that can trigger cascading failures across the...

5 months ago cs.MA cs.AI cs.CR PDF

Benchmark MEDIUM

Watermarks for Embeddings-as-a-Service Large Language Models

Anudeex Shetty

Large Language Models (LLMs) have demonstrated exceptional capabilities in natural language understanding and generation. Based on these LLMs,...

5 months ago cs.CL cs.CR cs.LG PDF

Attack MEDIUM

NetDeTox: Adversarial and Efficient Evasion of Hardware-Security GNNs via RL-LLM Orchestration

Zeng Wang, Minghao Shao, Akashdeep Saha +4 more

Graph neural networks (GNNs) have shown promise in hardware security by learning structural motifs from netlist graphs. However, this reliance on...

5 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Ghosting Your LLM: Without The Knowledge of Your Gradient and Data

Abeer Matar A. Almalky, Ziyan Wang, Mohaiminul Al Nahian +2 more

In recent years, large language models (LLMs) have achieved substantial advancements and are increasingly integrated into critical applications...

5 months ago cs.CR PDF

Benchmark MEDIUM

CacheTrap: Injecting Trojans in LLMs without Leaving any Traces in Inputs or Weights

Mohaiminul Al Nahian, Abeer Matar A. Almalky, Gamana Aragonda +6 more

Adversarial weight perturbation has emerged as a concerning threat to LLMs that either use training privileges or system-level access to inject...

5 months ago cs.CR PDF

Survey LOW

AI Deception: Risks, Dynamics, and Controls

Boyuan Chen, Sitong Fang, Jiaming Ji +57 more

As intelligence increases, so does its shadow. AI deception, in which systems induce false beliefs to secure self-beneficial outcomes, has evolved...

5 months ago cs.AI PDF

Attack HIGH

Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks

Richard J. Young

Large Language Model (LLM) safety guardrail models have emerged as a primary defense mechanism against harmful content generation, yet their...

5 months ago cs.CR PDF

Attack HIGH

Distillability of LLM Security Logic: Predicting Attack Success Rate of Outline Filling Attack via Ranking Regression

Tianyu Zhang, Zihang Xi, Jingyu Hua +1 more

In the realm of black-box jailbreak attacks on large language models (LLMs), the feasibility of constructing a narrow safety proxy, a lightweight...

5 months ago cs.CR cs.AI PDF

Tool MEDIUM

A Safety and Security Framework for Real-World Agentic Systems

Shaona Ghosh, Barnaby Simkin, Kyriacos Shiarlis +9 more

This paper introduces a dynamic and actionable framework for securing agentic AI systems in enterprise deployment. We contend that safety and...

5 months ago cs.LG cs.AI cs.CR PDF

Benchmark MEDIUM

Beyond Membership: Limitations of Add/Remove Adjacency in Differential Privacy

Gauri Pradhan, Joonas Jälkö, Santiago Zanella-Bèguelin +1 more

Training machine learning models with differential privacy (DP) limits an adversary's ability to infer sensitive information about the training data....

5 months ago cs.CR cs.LG PDF

Benchmark LOW

MADRA: Multi-Agent Debate for Risk-Aware Embodied Planning

Junjian Wang, Lidan Zhao, Xi Sheryl Zhang

Ensuring the safety of embodied AI agents during task planning is critical for real-world deployment, especially in household environments where...

5 months ago cs.AI PDF

Benchmark MEDIUM

The Phish, The Spam, and The Valid: Generating Feature-Rich Emails for Benchmarking LLMs

Rebeka Toth, Tamas Bisztray, Nils Gruschka

In this paper, we introduce a metadata-enriched generation framework (PhishFuzzer) that seeds real emails into Large Language Models (LLMs) to...

5 months ago cs.CR cs.AI cs.DB PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial