AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 101–120 of 1,906 papers

Clear filters

Benchmark MEDIUM

On the Privacy of LLMs: An Ablation Study

Karima Makhlouf, Lamiaa Basyoni, Syed Khaderi +4 more

Large language models (LLMs) are increasingly deployed in interactive and retrieval-augmented settings, raising significant privacy concerns. While...

1 weeks ago cs.CR cs.AI PDF

Attack HIGH

CBV: Clean-label Backdoor Attacks on Vision Language Models via Diffusion Models

Ji Guo, Xiaolong Qin, Cencen Liu +3 more

Vision-Language Models (VLMs) have achieved remarkable success in tasks such as image captioning and visual question answering (VQA). However, as...

1 weeks ago cs.AI PDF

Attack HIGH

When Alignment Isn't Enough: Response-Path Attacks on LLM Agents

Mingyu Luo, Zihan Zhang, Zesen Liu +7 more

Bring-Your-Own-Key (BYOK) agent architectures let users route LLM traffic through third-party relays, creating a critical integrity gap: a malicious...

1 weeks ago cs.CR cs.AI PDF

Attack MEDIUM

Adversarial Update-Based Federated Unlearning for Poisoned Model Recovery

Wenwei Zhao, Xiaowen Li, Yao Liu +1 more

Federated learning (FL) is vulnerable to poisoning attacks, where malicious clients upload manipulated updates to degrade the performance of the...

1 weeks ago cs.LG cs.CR PDF

Benchmark MEDIUM

Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration

Debeshee Das, Julien Piet, Darya Kaviani +3 more

Memory systems enable otherwise-stateless LLM agents to persist user information across sessions, but also introduce a new attack surface. We...

1 weeks ago cs.CR cs.AI PDF

Defense MEDIUM

RefusalGuard: Geometry-Preserving Fine-Tuning for Safety in LLMs

Sadia Asif, Mohammad Mohammadi Amiri

Fine-tuning safety-aligned language models for downstream tasks often leads to substantial degradation of refusal behavior, making models vulnerable...

1 weeks ago cs.LG cs.AI cs.CE PDF

Attack MEDIUM

Disentangling Intent from Role: Adversarial Self-Play for Persona-Invariant Safety Alignment

Jiajia Li, Xiaoyu Wen, Zhongtian Ma +3 more

The growing capabilities of large language models (LLMs) have driven their widespread deployment across diverse domains, even in potentially...

1 weeks ago cs.AI PDF

Attack MEDIUM

CyberAId: AI-Driven Cybersecurity for Financial Service Providers

George Fatouros, Georgios Makridis, John Soldatos +18 more

European financial institutions face mounting regulatory pressure while their security operations centres remain constrained not by data or staffing...

1 weeks ago cs.AI cs.CR cs.IR PDF

Survey LOW

QASecClaw: A Multi-Agent LLM Approach for False Positive Reduction in Static Application Security Testing

Mohd Ruhul Ameen, Md Takrim Ul Alam, Akif Islam

Static Application Security Testing tools help developers find security vulnerabilities before release, but they often produce many false positives....

1 weeks ago cs.CR cs.SE PDF

Benchmark MEDIUM

Repurposing and Evaluating the (In)Feasibility of Dataset Poisoning enabled Watermarking for Contrastive Learning

Zhiyang Dai, Yansong Gao, Boyu Kuang +5 more

Contrastive learning (CL) reduces annotation cost via auto-derived supervisory signals. Since large-scale in-house CL datasets are infeasible,...

1 weeks ago cs.CR cs.AI PDF

Benchmark MEDIUM

Needle-in-RAG: Prompt-Conditioned Character-Level Traceback of Poisoned Spans in Retrieved Evidence

Huining Cui, Wei Liu

Retrieval-augmented generation (RAG) improves factual grounding by conditioning large language models on retrieved evidence, but it also opens a...

1 weeks ago cs.CR cs.DB PDF

Attack HIGH

FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption

Yanting Wang, Chenlong Yin, Ying Chen +1 more

Long-context large language models (LLMs)-for example, Gemini-3.1-Pro and Qwen-3.5-are widely used to empower many real-world applications, such as...

1 weeks ago cs.CR PDF

Attack HIGH

Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection

Prashant Kulkarni

Multi-turn prompt injection follows a known attack path -- trust-building, pivoting, escalation but text-level defenses miss covert attacks where...

1 weeks ago cs.CR cs.AI PDF

Attack HIGH

TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning

Bowen Sun, Chaozhuo Li, Yaodong Yang +2 more

Decompositional jailbreaks pose a critical threat to large language models (LLMs) by allowing adversaries to fragment a malicious objective into a...

1 weeks ago cs.CR cs.CL cs.LG PDF

Attack MEDIUM

MASCing: Configurable Mixture-of-Experts Behavior via Activation Steering Masks

Jona te Lintelo, Lichao Wu, Marina Krček +2 more

Mixture-of-Experts (MoE) architectures in Large Language Models (LLMs) have significantly reduced inference costs through sparse activation. However,...

1 weeks ago cs.CR PDF

Defense MEDIUM

VOW: Verifiable and Oblivious Watermark Detection for Large Language Models

Xiaokun Luan, Yihao Zhang, Pengcheng Su +2 more

Large Language Model (LLM) watermarking is crucial for establishing the provenance of machine-generated text, but most existing methods rely on a...

1 weeks ago cs.CR PDF

Attack MEDIUM

Low Rank Adaptation for Adversarial Perturbation

Han Liu, Shanghao Shi, Yevgeniy Vorobeychik +2 more

Low-Rank Adaptation (LoRA), which leverages the insight that model updates typically reside in a low-dimensional space, has significantly improved...

1 weeks ago cs.LG cs.CR PDF

Survey HIGH

Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study

Luyao Xu, Xiang Chen

Autonomous agent frameworks built upon large language models (LLMs) are evolving into complex, tool-integrated, and continuously operating systems,...

1 weeks ago cs.CR cs.AI PDF

Benchmark MEDIUM

AdaBFL: Multi-Layer Defensive Adaptive Aggregation for Bzantine-Robust Federated Learning

Zehui Tang, Yuchen Liu, Feihu Huang

Federated learning (FL) is a popular distributed learning paradigm in machine learning, which enables multiple clients to collaboratively train...

1 weeks ago cs.LG cs.AI cs.CR PDF

Attack HIGH

Secret Stealing Attacks on Local LLM Fine-Tuning through Supply-Chain Model Code Backdoors

Zi Li, Tian Zhou, Wenze Li +3 more

Local fine-tuning datasets routinely contain sensitive secrets such as API keys, personal identifiers, and financial records. Although ''local...

1 weeks ago cs.CR cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial