AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 161–180 of 1,906 papers

Clear filters

Attack HIGH

Jailbreaking Frontier Foundation Models Through Intention Deception

Xinhe Wang, Katia Sycara, Yaqi Xie

Large (vision-)language models exhibit remarkable capability but remain highly susceptible to jailbreaking. Existing safety training approaches aim...

2 weeks ago cs.CR cs.AI cs.CL PDF

Benchmark MEDIUM

System-aware contextual digital twin for ICS anomaly diagnosis

Eungyu Woo, Yooshin Kim, Wonje Heo +1 more

Industrial Control Systems (ICS) integrate computing, physical processes, and communication to operate critical infrastructures such as power grids,...

2 weeks ago cs.CR PDF

Defense LOW

Disagreement as Signals: Dual-view Calibration for Sequential Recommendation Denoising

Sijia Li, Min Gao, Zongwei Wang +3 more

Sequential recommendation seeks to model the evolution of user interests by capturing temporal user intent and item-level transition patterns....

2 weeks ago cs.IR PDF

Survey MEDIUM

Poster: ClawdGo: Endogenous Security Awareness Training for Autonomous AI Agents

Jiaqi Li, Yang Zhao, Bin Sun +3 more

Autonomous AI agents deployed on platforms such as OpenClaw face prompt injection, memory poisoning, supply-chain attacks, and social engineering,...

2 weeks ago cs.CR cs.AI PDF

Benchmark HIGH

Evaluation of Prompt Injection Defenses in Large Language Models

Priyal Deep, Shane Emmons, Amy Fox +3 more

LLM-powered applications routinely embed secrets in system prompts, yet models can be tricked into revealing them. We built an adaptive attacker that...

2 weeks ago cs.CR cs.AI PDF

Tool MEDIUM

LLM-CEG: Extending the Classification Error Gauge Framework for Privacy Auditing of Large Language Models

Kato Mivule

This paper extends the Classification Error Gauge (x-CEG) framework, originally developed for measuring the privacy-utility trade-off in tabular...

2 weeks ago cs.CR PDF

Benchmark MEDIUM

Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms

Qi Li, Bo Yin, Weiqi Huang +6 more

Vision-Language-Action (VLA) models are emerging as a unified substrate for embodied intelligence. This shift raises a new class of safety...

2 weeks ago cs.RO PDF

Attack HIGH

Spore: Efficient and Training-Free Privacy Extraction Attack on LLMs via Inference-Time Hybrid Probing

Yu Cui, Ruiqing Yue, Hang Fu +6 more

With the wide adoption of personal AI assistants such as OpenClaw, privacy leakage in user interaction contexts with large language model (LLM)...

2 weeks ago cs.CR PDF

Attack LOW

Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture

Rong Xiang

Recent evidence suggests that frontier AI systems can exhibit agentic misalignment, generating and executing harmful actions derived from internally...

2 weeks ago cs.AI cs.CR PDF

Tool HIGH

Prompt-Unknown Promotion Attacks against LLM-based Sequential Recommender Systems

Yuchuan Zhao, Tong Chen, Junliang Yu +3 more

Large language model-powered sequential recommender systems (LLM-SRSs) have recently demonstrated remarkable performance, enabling recommendations...

2 weeks ago cs.IR PDF

Survey MEDIUM

When AI reviews science: Can we trust the referee?

Jialiang Wang, Yuchen Liu, Hang Xu +7 more

The volume of scientific submissions continues to climb, outpacing the capacity of qualified human referees and stretching editorial timelines. At...

2 weeks ago cs.AI PDF

Benchmark LOW

When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs

Pegah Khayatan, Jayneel Parekh, Arnaud Dapogny +3 more

Despite impressive progress in capabilities of large vision-language models (LVLMs), these systems remain vulnerable to hallucinations, i.e., outputs...

2 weeks ago cs.CV cs.AI cs.CL PDF

Attack HIGH

Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models

Naheed Rayhan, Sohely Jahan

Large language models (LLMs) are increasingly integrated into sensitive workflows, raising the stakes for adversarial robustness and safety. This...

2 weeks ago cs.CR cs.AI PDF

Attack HIGH

Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

Zihan Wang, Rui Zhang, Yu Liu +4 more

LLM agents increasingly rely on skills to encapsulate reusable capabilities via progressively disclosed instructions. High-quality skills inject...

2 weeks ago cs.CR PDF

Attack HIGH

Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers

Jiali Wei, Ming Fan, Guoheng Sun +3 more

The growing application of large language models (LLMs) in safety-critical domains has raised urgent concerns about their security. Many recent...

2 weeks ago cs.CR cs.AI cs.CL PDF

Tool HIGH

MCP Pitfall Lab: Exposing Developer Pitfalls in MCP Tool Server Security under Multi-Vector Attacks

Run Hao, Zhuoran Tan

Model Context Protocol (MCP) is increasingly adopted for tool-integrated LLM agents, but its multi-layer design and third-party server ecosystem...

2 weeks ago cs.CR PDF

Other LOW

Decoupled DiLoCo for Resilient Distributed Pre-training

Arthur Douillard, Keith Rush, Yani Donchev +14 more

Modern large-scale language model pre-training relies heavily on the single program multiple data (SPMD) paradigm, which requires tight coupling...

2 weeks ago cs.CL PDF

Benchmark MEDIUM

CSC: Turning the Adversary's Poison against Itself

Yuchen Shi, Xin Guo, Huajie Chen +3 more

Poisoning-based backdoor attacks pose significant threats to deep neural networks by embedding triggers in training data, causing models to...

2 weeks ago cs.CR cs.AI PDF

Benchmark MEDIUM

Supervised Learning Has a Necessary Geometric Blind Spot: Theory, Consequences, and Minimal Repair

Vishal Rajput

We prove that empirical risk minimisation (ERM) imposes a necessary geometric constraint on learned representations: any encoder that minimises...

2 weeks ago cs.LG cs.AI cs.CV PDF

Benchmark LOW

Understanding and Mitigating Spurious Signal Amplification in Test-Time Reinforcement Learning for Math Reasoning

Yongcan Yu, Lingxiao He, Jian Liang +5 more

Test-time reinforcement learning (TTRL) always adapts models at inference time via pseudo-labeling, leaving it vulnerable to spurious optimization...

2 weeks ago cs.LG cs.AI cs.CL PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial