AI Security Research

2,583+ academic papers on AI security, attacks, and defenses

Total

2,583

Attack

994

Benchmark

740

Defense

355

Tool

275

Survey

146

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 521–540 of 2,583 papers

Benchmark LOW

SkillProbe: Security Auditing for Emerging Agent Skill Marketplaces via Multi-Agent Collaboration

Zihan Guo, Zhiyu Chen, Xiaohang Nie +3 more

With the rapid evolution of Large Language Model (LLM) agent ecosystems, centralized skill marketplaces have emerged as pivotal infrastructure for...

1 months ago cs.CR cs.SE PDF

Attack MEDIUM

Detection of adversarial intent in Human-AI teams using LLMs

Abed K. Musaffar, Ambuj Singh, Francesco Bullo

Large language models (LLMs) are increasingly deployed in human-AI teams as support agents for complex tasks such as information retrieval,...

1 months ago cs.LG cs.AI cs.HC PDF

Defense MEDIUM

Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models

Xinyue Liu, Niloofar Mireshghallah, Jane C. Ginsburg +1 more

Frontier LLM companies have repeatedly assured courts and regulators that their models do not store copies of training data. They further rely on...

1 months ago cs.CL cs.AI cs.CY PDF

Tool MEDIUM

Before the Tool Call: Deterministic Pre-Action Authorization for Autonomous AI Agents

Uchi Uchibeke

AI agents today have passwords but no permission slips. They execute tool calls (fund transfers, database queries, shell commands, sub-agent...

1 months ago cs.CR cs.AI PDF

Survey HIGH

Profit is the Red Team: Stress-Testing Agents in Strategic Economic Interactions

Shouqiao Wang, Marcello Politi, Samuele Marro +1 more

As agentic systems move into real-world deployments, their decisions increasingly depend on external inputs such as retrieved content, tool outputs,...

1 months ago cs.AI PDF

Benchmark LOW

BenchBench: Benchmarking Automated Benchmark Generation

Yandan Zheng, Haoran Luo, Zhenghong Lin +2 more

Benchmarks are the de facto standard for tracking progress in large language models (LLMs), yet static test sets can rapidly saturate, become...

1 months ago cs.CL PDF

Attack HIGH

Adversarial Attacks on Locally Private Graph Neural Networks

Matta Varun, Ajay Kumar Dhakar, Yuan Hong +1 more

Graph neural network (GNN) is a powerful tool for analyzing graph-structured data. However, their vulnerability to adversarial attacks raises serious...

1 months ago cs.LG cs.CR PDF

Benchmark HIGH

AEGIS: From Clues to Verdicts -- Graph-Guided Deep Vulnerability Reasoning via Dialectics and Meta-Auditing

Sen Fang, Weiyuan Ding, Zhezhen Cao +2 more

Large Language Models (LLMs) are increasingly adopted for vulnerability detection, yet their reasoning remains fundamentally unsound. We identify a...

1 months ago cs.SE cs.AI cs.CR PDF

Attack HIGH

ACRFence: Preventing Semantic Rollback Attacks in Agent Checkpoint-Restore

Yusheng Zheng, Yiwei Yang, Wei Zhang +1 more

LLM agent frameworks increasingly offer checkpoint-restore for error recovery and exploration, advising developers to make external tool calls safe...

1 months ago cs.CR PDF

Benchmark MEDIUM

Unveiling the Security Risks of Federated Learning in the Wild: From Research to Practice

Jiahao Chen, Zhiming Zhao, Yuwen Pu +4 more

Federated learning (FL) has attracted substantial attention in both academia and industry, yet its practical security posture remains poorly...

1 months ago cs.CR PDF

Benchmark MEDIUM

LJ-Bench: Ontology-Based Benchmark for U.S. Crime

Hung Yun Tseng, Wuzhen Li, Blerina Gkotse +1 more

The potential of Large Language Models (LLMs) to provide harmful information remains a significant concern due to the vast breadth of illegal queries...

1 months ago cs.LG PDF

Benchmark MEDIUM

The production of meaning in the processing of natural language

Christopher J. Agostino, Quan Le Thien, Nayan D'Souza +1 more

Understanding the fundamental mechanisms governing the production of meaning in the processing of natural language is critical for designing safe,...

1 months ago cs.CL cs.AI cs.HC PDF

Attack HIGH

Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models

Wenjing Hong, Zhonghua Rong, Li Wang +5 more

Large Language Models (LLMs) have been widely deployed, especially through free Web-based applications that expose them to diverse user-generated...

1 months ago cs.CR cs.AI PDF

Attack MEDIUM

Memory poisoning and secure multi-agent systems

Vicenç Torra, Maria Bras-Amorós

Memory poisoning attacks for Agentic AI and multi-agent systems (MAS) have recently caught attention. It is partially due to the fact that Large...

1 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance

Fazhong Liu, Zhuoyan Chen, Tu Lan +6 more

Autonomous coding agents are increasingly integrated into software development workflows, offering capabilities that extend beyond code suggestion to...

1 months ago cs.CR cs.AI PDF

Benchmark LOW

What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time

Dong Yan, Jian Liang, Yanbo Wang +3 more

Test-Time Reinforcement Learning (TTRL) enables Large Language Models (LLMs) to enhance reasoning capabilities on unlabeled test streams by deriving...

1 months ago cs.LG cs.AI PDF

Defense LOW

Overreliance on AI in Information-seeking from Video Content

Anders Giovanni Møller, Elisa Bassignana, Francesco Pierri +1 more

The ubiquity of multimedia content is reshaping online information spaces, particularly in social media environments. At the same time, search is...

1 months ago cs.CY cs.CL cs.HC PDF

Attack MEDIUM

Graph-Aware Text-Only Backdoor Poisoning for Text-Attributed Graphs

Qi Luo, Minghui Xu, Dongxiao Yu +1 more

Many learning systems now use graph data in which each node also contains text, such as papers with abstracts or users with posts. Because these...

1 months ago cs.LG cs.CR PDF

Attack MEDIUM

Neural Uncertainty Principle: A Unified View of Adversarial Fragility and LLM Hallucination

Dong-Xiao Zhang, Hu Lou, Jun-Jie Zhang +2 more

Adversarial vulnerability in vision and hallucination in large language models are conventionally viewed as separate problems, each addressed with...

1 months ago cs.LG cs.IT physics.comp-ph PDF

Tool MEDIUM

A Framework for Formalizing LLM Agent Security

Vincent Siu, Jingxuan He, Kyle Montgomery +4 more

Security in LLM agents is inherently contextual. For example, the same action taken by an agent may represent legitimate behavior or a security...

1 months ago cs.CR cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial