AI Security Research

2,583+ academic papers on AI security, attacks, and defenses

Total

2,583

Attack

994

Benchmark

740

Defense

355

Tool

275

Survey

146

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 441–460 of 1,933 papers

Clear filters

Defense MEDIUM

ClawSafety: "Safe" LLMs, Unsafe Agents

Bowen Wei, Yunbei Zhang, Jinhao Pan +5 more

Personal AI agents like OpenClaw run with elevated privileges on users' local machines, where a single successful prompt injection can leak...

1 months ago cs.AI PDF

Attack HIGH

No Attacker Needed: Unintentional Cross-User Contamination in Shared-State LLM Agents

Tiankai Yang, Jiate Li, Yi Nian +5 more

LLM-based agents increasingly operate across repeated sessions, maintaining task states to ensure continuity. In many deployments, a single agent...

1 months ago cs.CL cs.AI cs.CR PDF

Defense MEDIUM

Safety, Security, and Cognitive Risks in World Models

Manoj Parmar

World models -- learned internal simulators of environment dynamics -- are rapidly becoming foundational to autonomous decision-making in robotics,...

1 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

AgentWatcher: A Rule-based Prompt Injection Monitor

Yanting Wang, Wei Zou, Runpeng Geng +1 more

Large language models (LLMs) and their applications, such as agents, are highly vulnerable to prompt injection attacks. State-of-the-art prompt...

1 months ago cs.CR PDF

Benchmark MEDIUM

SERSEM: Selective Entropy-Weighted Scoring for Membership Inference in Code Language Models

Kıvanç Kuzey Dikici, Serdar Kara, Semih Çağlar +2 more

As Large Language Models (LLMs) for code increasingly utilize massive, often non-permissively licensed datasets, evaluating data contamination...

1 months ago cs.SE cs.CR PDF

Defense MEDIUM

Multi-Agent LLM Governance for Safe Two-Timescale Reinforcement Learning in SDN-IoT Defense

Saeid Jamshidi, Negar Shahabi, Foutse Khomh +2 more

Software-Defined Networking (SDN) is increasingly adopted to secure Internet-of-Things (IoT) networks due to its centralized control and programmable...

1 months ago cs.CR PDF

Tool HIGH

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

Anubhab Sahu, Diptisha Samanta, Reza Soosahabi

System Instructions in Large Language Models (LLMs) are commonly used to enforce safety policies, define agent behavior, and protect sensitive...

1 months ago cs.CR cs.AI PDF

Tool HIGH

PDA: Text-Augmented Defense Framework for Robust Vision-Language Models against Adversarial Image Attacks

Jingning Xu, Haochen Luo, Chen Liu

Vision-language models (VLMs) are vulnerable to adversarial image perturbations. Existing works based on adversarial training against task-specific...

1 months ago cs.CV cs.MM PDF

Other MEDIUM

SCPatcher: Automated Smart Contract Code Repair via Retrieval-Augmented Generation and Knowledge Graph

Xiaoqi Li, Shipeng Ye, Wenkai Li +1 more

Smart contract vulnerabilities can cause substantial financial losses due to the immutability of code after deployment. While existing tools detect...

1 months ago cs.SE PDF

Defense LOW

LibScan: Smart Contract Library Misuse Detection with Iterative Feedback and Static Verification

Yishun Wang, Wenkai Li, Xiaoqi Li +3 more

Smart contracts are self-executing programs that manage financial transactions on blockchain networks. Developers commonly rely on third-party code...

1 months ago cs.SE cs.CR PDF

Attack HIGH

When Safe Models Merge into Danger: Exploiting Latent Vulnerabilities in LLM Fusion

Jiaqing Li, Zhibo Zhang, Shide Zhou +3 more

Model merging has emerged as a powerful technique for combining specialized capabilities from multiple fine-tuned LLMs without additional training...

1 months ago cs.CR PDF

Benchmark LOW

BloClaw: An Omniscient, Multi-Modal Agentic Workspace for Next-Generation Scientific Discovery

Yao Qin, Yangyang Yan, Jinhua Pang +1 more

The integration of Large Language Models (LLMs) into life sciences has catalyzed the development of "AI Scientists." However, translating these...

1 months ago cs.AI PDF

Tool HIGH

The Persistent Vulnerability of Aligned AI Systems

Aengus Lynch

Autonomous AI agents are being deployed with filesystem access, email control, and multi-step planning. This thesis contributes to four open problems...

1 months ago cs.LG cs.AI PDF

Benchmark MEDIUM

EnsembleSHAP: Faithful and Certifiably Robust Attribution for Random Subspace Method

Yanting Wang, Jinyuan Jia

Random subspace method has wide security applications such as providing certified defenses against adversarial and backdoor attacks, and building...

1 months ago cs.CR PDF

Tool HIGH

Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks

Chong Xiang, Drew Zagieboylo, Shaona Ghosh +5 more

AI agents, predominantly powered by large language models (LLMs), are vulnerable to indirect prompt injection, in which malicious instructions...

1 months ago cs.CR cs.AI PDF

Attack MEDIUM

Performative Scenario Optimization

Quanyan Zhu, Zhengye Han

This paper introduces a performative scenario optimization framework for decision-dependent chance-constrained problems. Unlike classical stochastic...

1 months ago cs.GT PDF

Survey LOW

PASM: Population Adaptive Symbolic Mixture-of-Experts Model for Cross-location Hurricane Evacuation Decision Prediction

Xiao Qian, Shangjia Dong

Accurate prediction of evacuation behavior is critical for disaster preparedness, yet models trained in one region often fail elsewhere. Using a...

1 months ago cs.LG cs.CY PDF

Other MEDIUM

BotVerse: Real-Time Event-Driven Simulation of Social Agents

Edoardo Allegrini, Edoardo Di Paolo, Angelo Spognardi +1 more

BotVerse is a scalable, event-driven framework for high-fidelity social simulation using LLM-based agents. It addresses the ethical risks of studying...

1 months ago cs.SI cs.AI cs.MA PDF

Attack HIGH

Adversarial Prompt Injection Attack on Multimodal Large Language Models

Meiwen Ding, Song Xia, Chenqi Kong +1 more

Although multimodal large language models (MLLMs) are increasingly deployed in real-world applications, their instruction-following behavior leaves...

1 months ago cs.CV cs.AI PDF

Attack LOW

AGFT: Alignment-Guided Fine-Tuning for Zero-Shot Adversarial Robustness of Vision-Language Models

Yubo Cui, Xianchao Guan, Zijun Xiong +1 more

Pre-trained vision-language models (VLMs) exhibit strong zero-shot generalization but remain vulnerable to adversarial perturbations. Existing...

1 months ago cs.CV cs.AI cs.LG PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial