AI Security Research

2,583+ academic papers on AI security, attacks, and defenses

Total

2,583

Attack

994

Benchmark

740

Defense

355

Tool

275

Survey

146

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 721–740 of 2,583 papers

Benchmark MEDIUM

Robust Single-message Shuffle Differential Privacy Protocol for Accurate Distribution Estimation

Xiaoguang Li, Hanyi Wang, Yaowei Huang +6 more

Shuffler-based differential privacy (shuffle-DP) is a privacy paradigm providing high utility by involving a shuffler to permute noisy report from...

2 months ago cs.CR PDF

Attack MEDIUM

Good-Enough LLM Obfuscation (GELO)

Anatoly Belikov, Ilya Fedotov

Large Language Models (LLMs) are increasingly served on shared accelerators where an adversary with read access to device memory can observe KV...

2 months ago cs.CR cs.LG PDF

Defense MEDIUM

ThaiSafetyBench: Assessing Language Model Safety in Thai Cultural Contexts

Trapoom Ukarapol, Nut Chukamphaeng, Kunat Pipatanakul +1 more

The safety evaluation of large language models (LLMs) remains largely centered on English, leaving non-English languages and culturally grounded...

2 months ago cs.CL PDF

Other LOW

Causally Robust Reward Learning from Reason-Augmented Preference Feedback

Minjune Hwang, Yigit Korkmaz, Daniel Seita +1 more

Preference-based reward learning is widely used for shaping agent behavior to match a user's preference, yet its sparse binary feedback makes it...

2 months ago cs.AI cs.LG cs.RO PDF

Benchmark MEDIUM

Osmosis Distillation: Model Hijacking with the Fewest Samples

Yuchen Shi, Huajie Chen, Heng Xu +6 more

Transfer learning is devised to leverage knowledge from pre-trained models to solve new tasks with limited data and computational resources....

2 months ago cs.CR cs.LG PDF

Attack HIGH

Multi-Paradigm Collaborative Adversarial Attack Against Multi-Modal Large Language Models

Yuanbo Li, Tianyang Xu, Cong Hu +3 more

The rapid progress of Multi-Modal Large Language Models (MLLMs) has significantly advanced downstream applications. However, this progress also...

2 months ago cs.CV PDF

Attack HIGH

Multi-Paradigm Collaborative Adversarial Attack Against Multi-Modal Large Language Models

Yuanbo Li, Tianyang Xu, Cong Hu +3 more

The rapid progress of Multi-Modal Large Language Models (MLLMs) has significantly advanced downstream applications. However, this progress also...

2 months ago cs.CV PDF

Survey MEDIUM

Design Behaviour Codes (DBCs): A Taxonomy-Driven Layered Governance Benchmark for Large Language Models

G. Madan Mohan, Veena Kiran Nambiar, Kiranmayee Janardhan

We introduce the Dynamic Behavioral Constraint (DBC) benchmark, the first empirical framework for evaluating the efficacy of a structured,...

2 months ago cs.AI PDF

Tool LOW

Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization

Furkan Mumcu, Yasin Yilmaz

As Large Language Models (LLMs) transition into autonomous multi-agent ecosystems, robust minimax training becomes essential yet remains prone to...

2 months ago cs.LG cs.AI cs.CR PDF

Attack MEDIUM

Efficient Refusal Ablation in LLM through Optimal Transport

Geraldin Nanfack, Eugene Belilovsky, Elvis Dohmatob

Safety-aligned language models refuse harmful requests through learned refusal behaviors encoded in their internal representations. Recent...

2 months ago cs.LG cs.AI PDF

Benchmark MEDIUM

PTOPOFL: Privacy-Preserving Personalised Federated Learning via Persistent Homology

Kelly L Vomo-Donfack, Adryel Hoszu, Grégory Ginot +1 more

Federated learning (FL) faces two structural tensions: gradient sharing enables data-reconstruction attacks, while non-IID client distributions...

2 months ago cs.LG cs.CR cs.DC PDF

Survey LOW

When AI Fails, What Works? A Data-Driven Taxonomy of Real-World AI Risk Mitigation Strategies

Evgenija Popchanovska, Ana Gjorgjevikj, Maryan Rizinski +3 more

Large language models (LLMs) are increasingly embedded in high-stakes workflows, where failures propagate beyond isolated model errors into systemic...

2 months ago cs.CY cs.AI PDF

Benchmark MEDIUM

Code Fingerprints: Disentangled Attribution of LLM-Generated Code

Jiaxun Guo, Ziyuan Yang, Mengyu Sun +3 more

The rapid adoption of Large Language Models (LLMs) has transformed modern software development by enabling automated code generation at scale. While...

2 months ago cs.SE cs.CL PDF

Attack LOW

Bayesian Adversarial Privacy

Cameron Bell, Timothy Johnston, Antoine Luciano +1 more

Theoretical and applied research into privacy encompasses an incredibly broad swathe of differing approaches, emphasis and aims. This work introduces...

2 months ago math.ST cs.CR cs.LG PDF

Tool HIGH

CAM-LDS: Cyber Attack Manifestations for Automatic Interpretation of System Logs and Security Alerts

Max Landauer, Wolfgang Hotwagner, Thorina Boenke +2 more

Log data are essential for intrusion detection and forensic investigations. However, manual log analysis is tedious due to high data volumes,...

2 months ago cs.CR cs.AI PDF

Tool MEDIUM

A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality

Arther Tian, Alex Ding, Frank Chen +2 more

Decentralized large language model (LLM) inference networks can pool heterogeneous compute to scale serving, but they require lightweight and...

2 months ago cs.LG cs.AI cs.CR PDF

Attack MEDIUM

From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration

Yizhe Xie, Congcong Zhu, Xinyue Zhang +5 more

Large Language Model-based Multi-Agent Systems (LLM-MAS) are increasingly applied to complex collaborative scenarios. However, their collaborative...

2 months ago cs.MA cs.AI PDF

Attack HIGH

When Safety Becomes a Vulnerability: Exploiting LLM Alignment Homogeneity for Transferable Blocking in RAG

Junchen Li, Chao Qi, Rongzheng Wang +5 more

Retrieval-Augmented Generation (RAG) enhances the capabilities of large language models (LLMs) by incorporating external knowledge, but its reliance...

2 months ago cs.CR PDF

Attack HIGH

Structure-Aware Distributed Backdoor Attacks in Federated Learning

Wang Jian, Shen Hong, Ke Wei +1 more

While federated learning protects data privacy, it also makes the model update process vulnerable to long-term stealthy perturbations. Existing...

2 months ago cs.LG cs.AI cs.CR PDF

Benchmark MEDIUM

In-Context Environments Induce Evaluation-Awareness in Language Models

Maheep Chaudhary

Humans often become more self-aware under threat, yet can lose self-awareness when absorbed in a task; we hypothesize that language models exhibit...

2 months ago cs.AI cs.CL cs.LG PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial