AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 81–100 of 294 papers

Clear filters

Benchmark MEDIUM

Enhancing Agent Safety Judgment: Controlled Benchmark Rewriting and Analogical Reasoning for Deceptive Out-of-Distribution Scenarios

Zuoyu Zhang, Yancheng Zhu

Tool-using agent systems powered by large language models (LLMs) are increasingly deployed across web, app, operating-system, and transactional...

1 weeks ago cs.AI PDF

Benchmark MEDIUM

MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory

Yuhui Wang, Tanqiu Jiang, Jiacheng Liang +2 more

As large language model (LLM)-powered agents are increasingly deployed to perform complex, real-world tasks, they face a growing class of attacks...

1 weeks ago cs.CR cs.AI cs.CL PDF

Defense MEDIUM

Self-Mined Hardness for Safety Fine-Tuning

Prakhar Gupta, Garv Shah, Donghua Zhang

Safety fine-tuning of language models typically requires a curated adversarial dataset. We take a different approach: score each candidate prompt's...

1 weeks ago cs.LG cs.AI cs.CR PDF

Survey MEDIUM

When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI

Javad Forough, Marios Kogias, Hamed Haddadi

Agentic AI systems, specifically LLM-driven agents that plan, invoke tools, maintain persistent memory, and delegate tasks to peer agents via...

1 weeks ago cs.CR cs.AI PDF

Attack MEDIUM

Dependency-Aware Privacy for Multi-turn Agents

Divyam Anshumaan, Sarthak Choudhary, Nils Palumbo +1 more

LLM agents release private data across multi-service interactions. Existing prompt sanitizers based on metric differential privacy treat each release...

1 weeks ago cs.CR PDF

Attack MEDIUM

PIIGuard: Mitigating PII Harvesting under Adversarial Sanitization

Mingshuo Liu, Yiwei Zha, Min Chen

Browsing-enabled LLM assistants can fetch webpages and answer contact-seeking queries, creating a practical channel for scraping contact-style...

1 weeks ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Revisiting JBShield: Breaking and Rebuilding Representation-Level Jailbreak Defenses

Kemal Derya, Berk Sunar

Defending large language models (LLMs) against jailbreak attacks, such as Greedy Coordinate Gradient (GCG), remains a challenge, particularly under...

1 weeks ago cs.CR PDF

Benchmark LOW

Distributed Deep Variational Approach for Privacy-preserving Data Release

Zahir Alsulaimawi, Huaping Liu

Federated learning (FL) lets distributed nodes train a shared model without exchanging their raw data, but in privacy-sensitive deployments medical...

1 weeks ago cs.CR cs.LG PDF

Tool MEDIUM

Stable Agentic Control: Tool-Mediated LLM Architecture for Autonomous Cyber Defense

Kerri Prinos, Lilianne Brush, Cameron Denton +5 more

Agentic systems involved in high-stake decision-making under adversarial pressure need formal guarantees not offered by existing approaches....

1 weeks ago cs.AI cs.CR eess.SY PDF

Attack HIGH

EvoPoC: Automated Exploit Synthesis for DeFi Smart Contracts via Hierarchical Knowledge Graphs

Ruichao Liang, Jing Chen, Xianglong Li +5 more

Smart contract vulnerabilities in Decentralized Finance caused over billions of dollars losses every year, yet the security community faces a...

1 weeks ago cs.CR cs.SE PDF

Tool MEDIUM

Autonomous LLM Agent Worms: Cross-Platform Propagation, Automated Discovery and Temporal Re-Entry Defense

Mingming Zha, Xiaofeng Wang

Autonomous LLM agents operate as long-running processes with persistent workspaces, memory files, scheduled task state, and messaging integrations....

1 weeks ago cs.CR PDF

Attack MEDIUM

Tool Use as Action: Towards Agentic Control in Mobile Core Networks

Purna Sai Garigipati, Onur Ayan, Kishor Chandra Joshi +1 more

Artificial Intelligence (AI) will play an essential role in 6G. It will fundamentally reshape the network architecture itself and drive major changes...

1 weeks ago cs.NI eess.SY PDF

Defense LOW

Dimensionality-Aware Anomaly Detection in Learned Representations of Self-Supervised Speech Models

Sandra Arcos-Holzinger, Sarah M. Erfani, James Bailey +1 more

Self-supervised speech models (S3Ms) achieve strong downstream performance, yet their learned representations remain poorly understood under natural...

1 weeks ago eess.AS cs.CR cs.LG PDF

Benchmark LOW

An Empirical Study of Agent Skills for Healthcare: Practice, Gaps, and Governance

Gelei Xu, Ningzhi Tang, Xueyang Li +4 more

Healthcare automation is shaped by local procedures and organizational constraints, so agent capabilities rarely transfer unchanged across settings....

1 weeks ago cs.AI PDF

Survey LOW

AcademiClaw: When Students Set Challenges for AI Agents

Junjie Yu, Pengrui Lu, Weiye Si +75 more

Benchmarks within the OpenClaw ecosystem have thus far evaluated exclusively assistant-level tasks, leaving the academic-level capabilities of...

1 weeks ago cs.AI cs.CY PDF

Attack HIGH

ContextualJailbreak: Evolutionary Red-Teaming via Simulated Conversational Priming

Mario Rodríguez Béjar, Francisco J. Cortés-Delgado, S. Braghin +1 more

Large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safety alignment and elicit harmful responses. A growing body of work...

1 weeks ago cs.CL cs.CR PDF

Attack HIGH

Noisy Networks, Nosy Neighbors: Simple Privacy Attacks Against Residential Wireless Traffic

Arne Roszeitis, Bartosz Burgiel, Victor Jüttner +1 more

Smart devices, such as light bulbs, TVs, fridges, etc., equipped with computing capabilities and wireless communication, are part of everyday life in...

1 weeks ago cs.CR PDF

Attack MEDIUM

Fight Poison with Poison: Enhancing Robustness in Few-shot Machine-Generated Text Detection with Adversarial Training

Wenjing Duan, Qi Zhou, Yuanfan Li

Machine-generated text (MGT) detection is critical for regulating online information ecosystems, yet existing detectors often underperform in...

1 weeks ago cs.CR cs.CL PDF

Benchmark MEDIUM

Privacy Preserving Machine Learning Workflow: from Anonymization to Personalized Differential Privacy Budgets in Federated Learning

Judith Sáinz-Pardo Díaz, Álvaro López García

The growing development of artificial intelligence based solutions, together with privacy legislation, has driven the rise of the so-called privacy...

1 weeks ago cs.CR cs.AI PDF

Attack HIGH

APIOT: Autonomous Vulnerability Management Across Bare-Metal Industrial OT Networks

Adel ElZemity, Budi Arief, Shujun Li +6 more

Bare-metal operational technology (OT) devices -- especially the microcontrollers running Modbus/TCP and CoAP at the base of industrial control...

1 weeks ago cs.CR cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial