AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 41–60 of 1,207 papers

Clear filters

Defense MEDIUM

Self-Mined Hardness for Safety Fine-Tuning

Prakhar Gupta, Garv Shah, Donghua Zhang

Safety fine-tuning of language models typically requires a curated adversarial dataset. We take a different approach: score each candidate prompt's...

1 weeks ago cs.LG cs.AI cs.CR PDF

Survey MEDIUM

When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI

Javad Forough, Marios Kogias, Hamed Haddadi

Agentic AI systems, specifically LLM-driven agents that plan, invoke tools, maintain persistent memory, and delegate tasks to peer agents via...

1 weeks ago cs.CR cs.AI PDF

Attack MEDIUM

Dependency-Aware Privacy for Multi-turn Agents

Divyam Anshumaan, Sarthak Choudhary, Nils Palumbo +1 more

LLM agents release private data across multi-service interactions. Existing prompt sanitizers based on metric differential privacy treat each release...

1 weeks ago cs.CR PDF

Attack MEDIUM

PIIGuard: Mitigating PII Harvesting under Adversarial Sanitization

Mingshuo Liu, Yiwei Zha, Min Chen

Browsing-enabled LLM assistants can fetch webpages and answer contact-seeking queries, creating a practical channel for scraping contact-style...

1 weeks ago cs.CR cs.AI cs.CL PDF

Tool MEDIUM

Stable Agentic Control: Tool-Mediated LLM Architecture for Autonomous Cyber Defense

Kerri Prinos, Lilianne Brush, Cameron Denton +5 more

Agentic systems involved in high-stake decision-making under adversarial pressure need formal guarantees not offered by existing approaches....

1 weeks ago cs.AI cs.CR eess.SY PDF

Tool MEDIUM

Autonomous LLM Agent Worms: Cross-Platform Propagation, Automated Discovery and Temporal Re-Entry Defense

Mingming Zha, Xiaofeng Wang

Autonomous LLM agents operate as long-running processes with persistent workspaces, memory files, scheduled task state, and messaging integrations....

1 weeks ago cs.CR PDF

Attack MEDIUM

Tool Use as Action: Towards Agentic Control in Mobile Core Networks

Purna Sai Garigipati, Onur Ayan, Kishor Chandra Joshi +1 more

Artificial Intelligence (AI) will play an essential role in 6G. It will fundamentally reshape the network architecture itself and drive major changes...

1 weeks ago cs.NI eess.SY PDF

Attack MEDIUM

Fight Poison with Poison: Enhancing Robustness in Few-shot Machine-Generated Text Detection with Adversarial Training

Wenjing Duan, Qi Zhou, Yuanfan Li

Machine-generated text (MGT) detection is critical for regulating online information ecosystems, yet existing detectors often underperform in...

1 weeks ago cs.CR cs.CL PDF

Benchmark MEDIUM

Privacy Preserving Machine Learning Workflow: from Anonymization to Personalized Differential Privacy Budgets in Federated Learning

Judith Sáinz-Pardo Díaz, Álvaro López García

The growing development of artificial intelligence based solutions, together with privacy legislation, has driven the rise of the so-called privacy...

1 weeks ago cs.CR cs.AI PDF

Benchmark MEDIUM

On the Privacy of LLMs: An Ablation Study

Karima Makhlouf, Lamiaa Basyoni, Syed Khaderi +4 more

Large language models (LLMs) are increasingly deployed in interactive and retrieval-augmented settings, raising significant privacy concerns. While...

1 weeks ago cs.CR cs.AI PDF

Attack MEDIUM

Adversarial Update-Based Federated Unlearning for Poisoned Model Recovery

Wenwei Zhao, Xiaowen Li, Yao Liu +1 more

Federated learning (FL) is vulnerable to poisoning attacks, where malicious clients upload manipulated updates to degrade the performance of the...

1 weeks ago cs.LG cs.CR PDF

Benchmark MEDIUM

Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration

Debeshee Das, Julien Piet, Darya Kaviani +3 more

Memory systems enable otherwise-stateless LLM agents to persist user information across sessions, but also introduce a new attack surface. We...

1 weeks ago cs.CR cs.AI PDF

Defense MEDIUM

RefusalGuard: Geometry-Preserving Fine-Tuning for Safety in LLMs

Sadia Asif, Mohammad Mohammadi Amiri

Fine-tuning safety-aligned language models for downstream tasks often leads to substantial degradation of refusal behavior, making models vulnerable...

1 weeks ago cs.LG cs.AI cs.CE PDF

Attack MEDIUM

Disentangling Intent from Role: Adversarial Self-Play for Persona-Invariant Safety Alignment

Jiajia Li, Xiaoyu Wen, Zhongtian Ma +3 more

The growing capabilities of large language models (LLMs) have driven their widespread deployment across diverse domains, even in potentially...

1 weeks ago cs.AI PDF

Attack MEDIUM

CyberAId: AI-Driven Cybersecurity for Financial Service Providers

George Fatouros, Georgios Makridis, John Soldatos +18 more

European financial institutions face mounting regulatory pressure while their security operations centres remain constrained not by data or staffing...

1 weeks ago cs.AI cs.CR cs.IR PDF

Benchmark MEDIUM

Repurposing and Evaluating the (In)Feasibility of Dataset Poisoning enabled Watermarking for Contrastive Learning

Zhiyang Dai, Yansong Gao, Boyu Kuang +5 more

Contrastive learning (CL) reduces annotation cost via auto-derived supervisory signals. Since large-scale in-house CL datasets are infeasible,...

1 weeks ago cs.CR cs.AI PDF

Benchmark MEDIUM

Needle-in-RAG: Prompt-Conditioned Character-Level Traceback of Poisoned Spans in Retrieved Evidence

Huining Cui, Wei Liu

Retrieval-augmented generation (RAG) improves factual grounding by conditioning large language models on retrieved evidence, but it also opens a...

1 weeks ago cs.CR cs.DB PDF

Attack MEDIUM

MASCing: Configurable Mixture-of-Experts Behavior via Activation Steering Masks

Jona te Lintelo, Lichao Wu, Marina Krček +2 more

Mixture-of-Experts (MoE) architectures in Large Language Models (LLMs) have significantly reduced inference costs through sparse activation. However,...

1 weeks ago cs.CR PDF

Defense MEDIUM

VOW: Verifiable and Oblivious Watermark Detection for Large Language Models

Xiaokun Luan, Yihao Zhang, Pengcheng Su +2 more

Large Language Model (LLM) watermarking is crucial for establishing the provenance of machine-generated text, but most existing methods rely on a...

1 weeks ago cs.CR PDF

Attack MEDIUM

Low Rank Adaptation for Adversarial Perturbation

Han Liu, Shanghao Shi, Yevgeniy Vorobeychik +2 more

Low-Rank Adaptation (LoRA), which leverages the insight that model updates typically reside in a low-dimensional space, has significantly improved...

1 weeks ago cs.LG cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial