AI Security Research

2,560+ academic papers on AI security, attacks, and defenses

Total

2,560

Attack

982

Benchmark

736

Defense

350

Tool

275

Survey

144

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 181–200 of 1,937 papers

Clear filters

Defense MEDIUM

Layerwise Convergence Fingerprints for Runtime Misbehavior Detection in Large Language Models

Nay Myat Min, Long H. Pham, Jun Sun

Large language models deployed at runtime can misbehave in ways that clean-data validation cannot anticipate: training-time backdoors lie dormant...

2 weeks ago cs.CR cs.AI cs.CL PDF

Benchmark MEDIUM

GAMMAF: A Common Framework for Graph-Based Anomaly Monitoring Benchmarking in LLM Multi-Agent Systems

Pablo Mateo-Torrejón, Alfonso Sánchez-Macián

The rapid integration of Large Language Models (LLMs) into Multi-Agent Systems (MAS) has significantly enhanced their collaborative problem-solving...

2 weeks ago cs.CR cs.AI cs.MA PDF

Survey MEDIUM

A Survey on Split Learning for LLM Fine-Tuning: Models, Systems, and Privacy Optimizations

Zihan Liu, Yizhen Wang, Rui Wang +2 more

Fine-tuning unlocks large language models (LLMs) for specialized applications, but its high computational cost often puts it out of reach for...

2 weeks ago cs.CR cs.CL cs.DC PDF

Attack MEDIUM

Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training

Mengnan Zhao, Lihe Zhang, Tianhang Zheng +2 more

Fast Adversarial Training (FAT) has attracted significant attention due to its efficiency in enhancing neural network robustness against adversarial...

2 weeks ago cs.LG cs.AI cs.CR PDF

Tool LOW

OS-SPEAR: A Toolkit for the Safety, Performance,Efficiency, and Robustness Analysis of OS Agents

Zheng Wu, Yi Hua, Zhaoyuan Huang +8 more

The evolution of Multimodal Large Language Models (MLLMs) has shifted the focus from text generation to active behavioral execution, particularly via...

2 weeks ago cs.CL PDF

Benchmark MEDIUM

GoAT-X: A Graph of Auditing Thoughts for Securing Token Transactions in Cross-Chain Contracts

Zijun Feng, Yuming Feng, Yu Wang +4 more

Cross-chain bridges, the critical infrastructure of the multi-chain ecosystem, have become a primary target for attackers, resulting in over $2.8...

2 weeks ago cs.CR PDF

Attack MEDIUM

Mitigating Error Amplification in Fast Adversarial Training

Mengnan Zhao, Lihe Zhang, Bo Wang +3 more

Fast Adversarial Training (FAT) has proven effective in enhancing model robustness by encouraging networks to learn perturbation-invariant...

2 weeks ago cs.LG cs.CR PDF

Benchmark MEDIUM

Dynamic Cyber Ranges

Víctor Mayoral-Vilches, María Sanz-Gómez, Francesco Balassone +6 more

As LLM-driven agents advance in cybersecurity, Jeopardy CTF benchmarks are approaching saturation and cyber ranges, the natural next evaluation...

2 weeks ago cs.CR PDF

Defense MEDIUM

Defusing the Trigger: Plug-and-Play Defense for Backdoored LLMs via Tail-Risk Intrinsic Geometric Smoothing

Kaisheng Fan, Weizhe Zhang, Yishu Gao +2 more

Defending against backdoor attacks in large language models remains a critical practical challenge. Existing defenses mitigate these threats but...

2 weeks ago cs.CR cs.AI PDF

Attack HIGH

AgentVisor: Defending LLM Agents Against Prompt Injection via Semantic Virtualization

Zonghao Ying, Haozheng Wang, Jiangfan Liu +5 more

Large Language Model (LLM) agents are increasingly used to automate complex workflows, but integrating untrusted external data with privileged...

2 weeks ago cs.CR PDF

Tool LOW

Closing the Loop: A Software Framework for AI to Support Business Decision Making

Jeffrey Wong, Antoine Creux

Create an idea, prototype it, evaluate if users like it, then learn. It is the circle of business. If AI can operate in all parts of the circle, it...

2 weeks ago cs.SE cs.MS stat.AP PDF

Attack HIGH

Jailbreaking Frontier Foundation Models Through Intention Deception

Xinhe Wang, Katia Sycara, Yaqi Xie

Large (vision-)language models exhibit remarkable capability but remain highly susceptible to jailbreaking. Existing safety training approaches aim...

2 weeks ago cs.CR cs.AI cs.CL PDF

Benchmark MEDIUM

System-aware contextual digital twin for ICS anomaly diagnosis

Eungyu Woo, Yooshin Kim, Wonje Heo +1 more

Industrial Control Systems (ICS) integrate computing, physical processes, and communication to operate critical infrastructures such as power grids,...

2 weeks ago cs.CR PDF

Defense LOW

Disagreement as Signals: Dual-view Calibration for Sequential Recommendation Denoising

Sijia Li, Min Gao, Zongwei Wang +3 more

Sequential recommendation seeks to model the evolution of user interests by capturing temporal user intent and item-level transition patterns....

2 weeks ago cs.IR PDF

Survey MEDIUM

Poster: ClawdGo: Endogenous Security Awareness Training for Autonomous AI Agents

Jiaqi Li, Yang Zhao, Bin Sun +3 more

Autonomous AI agents deployed on platforms such as OpenClaw face prompt injection, memory poisoning, supply-chain attacks, and social engineering,...

2 weeks ago cs.CR cs.AI PDF

Benchmark HIGH

Evaluation of Prompt Injection Defenses in Large Language Models

Priyal Deep, Shane Emmons, Amy Fox +3 more

LLM-powered applications routinely embed secrets in system prompts, yet models can be tricked into revealing them. We built an adaptive attacker that...

2 weeks ago cs.CR cs.AI PDF

Tool MEDIUM

LLM-CEG: Extending the Classification Error Gauge Framework for Privacy Auditing of Large Language Models

Kato Mivule

This paper extends the Classification Error Gauge (x-CEG) framework, originally developed for measuring the privacy-utility trade-off in tabular...

2 weeks ago cs.CR PDF

Benchmark MEDIUM

Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms

Qi Li, Bo Yin, Weiqi Huang +6 more

Vision-Language-Action (VLA) models are emerging as a unified substrate for embodied intelligence. This shift raises a new class of safety...

2 weeks ago cs.RO PDF

Attack HIGH

Spore: Efficient and Training-Free Privacy Extraction Attack on LLMs via Inference-Time Hybrid Probing

Yu Cui, Ruiqing Yue, Hang Fu +6 more

With the wide adoption of personal AI assistants such as OpenClaw, privacy leakage in user interaction contexts with large language model (LLM)...

2 weeks ago cs.CR PDF

Attack LOW

Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture

Rong Xiang

Recent evidence suggests that frontier AI systems can exhibit agentic misalignment, generating and executing harmful actions derived from internally...

2 weeks ago cs.AI cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial