AI Security Research

2,589+ academic papers on AI security, attacks, and defenses

Total

2,589

Attack

998

Benchmark

740

Defense

355

Tool

276

Survey

147

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1121–1140 of 1,926 papers

Clear filters

Attack HIGH

David vs. Goliath: Verifiable Agent-to-Agent Jailbreaking via Reinforcement Learning

Samuel Nellessen, Tal Kachman

The evolution of large language models into autonomous agents introduces adversarial failures that exploit legitimate tool privileges, transforming...

3 months ago cs.LG cs.AI cs.CR PDF

Benchmark MEDIUM

Trustworthy Blockchain-based Federated Learning for Electronic Health Records: Securing Participant Identity with Decentralized Identifiers and Verifiable Credentials

Rodrigo Tertulino, Ricardo Almeida, Laercio Alencar

The digitization of healthcare has generated massive volumes of Electronic Health Records (EHRs), offering unprecedented opportunities for training...

3 months ago cs.CR cs.AI cs.LG PDF

Benchmark LOW

Guaranteeing Privacy in Hybrid Quantum Learning through Theoretical Mechanisms

Hoang M. Ngo, Tre' R. Jeter, Incheol Shin +3 more

Quantum Machine Learning (QML) is becoming increasingly prevalent due to its potential to enhance classical machine learning (ML) tasks, such as...

3 months ago quant-ph cs.CR PDF

Attack MEDIUM

vLLM Hook v0: A Plug-in for Programming Model Internals on vLLM

Ching-Yun Ko, Pin-Yu Chen

Modern artificial intelligence (AI) models are deployed on inference engines to optimize runtime efficiency and resource allocation, particularly for...

3 months ago cs.LG cs.CL cs.PL PDF

Defense MEDIUM

RACA: Representation-Aware Coverage Criteria for LLM Safety Testing

Zeming Wei, Zhixin Zhang, Chengcan Wu +3 more

Recent advancements in LLMs have led to significant breakthroughs in various AI applications. However, their sophisticated capabilities also...

3 months ago cs.SE cs.AI cs.CL PDF

Attack HIGH

Co-RedTeam: Orchestrated Security Discovery and Exploitation with LLM Agents

Pengfei He, Ash Fox, Lesly Miculicich +7 more

Large language models (LLMs) have shown promise in assisting cybersecurity tasks, yet existing approaches struggle with automatic vulnerability...

3 months ago cs.LG cs.CR PDF

Attack HIGH

HPE: Hallucinated Positive Entanglement for Backdoor Attacks in Federated Self-Supervised Learning

Jiayao Wang, Yang Song, Zhendong Zhao +5 more

Federated self-supervised learning (FSSL) enables collaborative training of self-supervised representation models without sharing raw unlabeled data....

3 months ago cs.CR PDF

Defense MEDIUM

TinyGuard:A lightweight Byzantine Defense for Resource-Constrained Federated Learning via Statistical Update Fingerprints

Ali Mahdavi, Santa Aghapour, Azadeh Zamanifar +1 more

Existing Byzantine robust aggregation mechanisms typically rely on fulldimensional gradi ent comparisons or pairwise distance computations, resulting...

3 months ago cs.CR cs.AI PDF

Tool MEDIUM

Human Society-Inspired Approaches to Agentic AI Security: The 4C Framework

Alsharif Abuadbba, Nazatul Sultan, Surya Nepal +1 more

AI is moving from domain-specific autonomy in closed, predictable settings to large-language-model-driven agents that plan and act in open,...

3 months ago cs.CR cs.AI PDF

Defense MEDIUM

Concept-Based Dictionary Learning for Inference-Time Safety in Vision Language Action Models

Siqi Wen, Shu Yang, Shaopeng Fu +3 more

Vision Language Action (VLA) models close the perception action loop by translating multimodal instructions into executable behaviors, but this very...

3 months ago cs.RO PDF

Defense MEDIUM

Concept-Based Dictionary Learning for Inference-Time Safety in Vision Language Action Models

Siqi Wen, Shu Yang, Shaopeng Fu +3 more

Vision Language Action (VLA) models close the perception action loop by translating multimodal instructions into executable behaviors, but this very...

3 months ago cs.RO PDF

Benchmark LOW

Seeing Is Believing? A Benchmark for Multimodal Large Language Models on Visual Illusions and Anomalies

Wenjin Hou, Wei Liu, Han Hu +3 more

Multimodal Large Language Models (MLLMs) have shown remarkable proficiency on general-purpose vision-language benchmarks, reaching or even exceeding...

3 months ago cs.CV PDF

Attack HIGH

RedVisor: Reasoning-Aware Prompt Injection Defense via Zero-Copy KV Cache Reuse

Mingrui Liu, Sixiao Zhang, Cheng Long +1 more

Large Language Models (LLMs) are increasingly vulnerable to Prompt Injection (PI) attacks, where adversarial instructions hidden within retrieved...

3 months ago cs.CR cs.AI cs.LG PDF

Survey MEDIUM

Measuring Pragmatic Influence in Large Language Model Instructions

Yilin Geng, Omri Abend, Eduard Hovy +1 more

It is not only what we ask large language models (LLMs) to do that matters, but also how we prompt. Phrases like "This is urgent" or "As your...

3 months ago cs.CL cs.AI PDF

Attack LOW

$\textbf{AGT$^{AO}$}$: Robust and Stabilized LLM Unlearning via Adversarial Gating Training with Adaptive Orthogonality

Pengyu Li, Lingling Zhang, Zhitao Gao +5 more

While Large Language Models (LLMs) have achieved remarkable capabilities, they unintentionally memorize sensitive data, posing critical privacy and...

3 months ago cs.LG cs.CL PDF

Attack HIGH

Efficient Adversarial Attacks on High-dimensional Offline Bandits

Seyed Mohammad Hadi Hosseini, Amir Najafi, Mahdieh Soleymani Baghshah

Bandit algorithms have recently emerged as a powerful tool for evaluating machine learning models, including generative image models and large...

3 months ago cs.LG cs.AI PDF

Benchmark MEDIUM

Expected Harm: Rethinking Safety Evaluation of (Mis)Aligned LLMs

Yen-Shan Chen, Zhi Rui Tam, Cheng-Kuang Wu +1 more

Current evaluations of LLM safety predominantly rely on severity-based taxonomies to assess the harmfulness of malicious queries. We argue that this...

3 months ago cs.CR cs.CL cs.CY PDF

Tool HIGH

Provable Defense Framework for LLM Jailbreaks via Noise-Augumented Alignment

Zehua Cheng, Jianwei Yang, Wei Dai +1 more

Large Language Models (LLMs) remain vulnerable to adaptive jailbreaks that easily bypass empirical defenses like GCG. We propose a framework for...

3 months ago cs.CL cs.AI PDF

Attack HIGH

SGHA-Attack: Semantic-Guided Hierarchical Alignment for Transferable Targeted Attacks on Vision-Language Models

Haobo Wang, Weiqi Luo, Xiaojun Jia +1 more

Large vision-language models (VLMs) are vulnerable to transfer-based adversarial perturbations, enabling attackers to optimize on surrogate models...

3 months ago cs.CV PDF

Attack HIGH

MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety

Xiaoyu Wen, Zhida He, Han Qi +7 more

Ensuring robust safety alignment is crucial for Large Language Models (LLMs), yet existing defenses often lag behind evolving adversarial attacks due...

3 months ago cs.AI cs.CL cs.LG PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial