AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 101–120 of 312 papers

Clear filters

Attack MEDIUM

From Tool Orchestration to Code Execution: A Study of MCP Design Choices

Yuval Felendler, Parth A. Gandhi, Idan Habler +2 more

Model Context Protocols (MCPs) provide a unified platform for agent systems to discover, select, and orchestrate tools across heterogeneous execution...

2 months ago cs.CR cs.AI PDF

Attack MEDIUM

SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Memory Poisoning

Varun Pratap Bhardwaj

We present SuperLocalMemory, a local-first memory system for multi-agent AI that defends against OWASP ASI06 memory poisoning through architectural...

2 months ago cs.AI cs.CR PDF

Attack MEDIUM

Closing the Distribution Gap in Adversarial Training for LLMs

Chengzhi Hu, Jonas Dornbusch, David Lüdke +2 more

Adversarial training for LLMs is one of the most promising methods to reliably improve robustness against adversaries. However, despite significant...

2 months ago cs.LG cs.AI cs.CR PDF

Attack MEDIUM

Overthinking Loops in Agents: A Structural Risk via MCP Tools

Yohan Lee, Jisoo Jang, Seoyeon Choi +2 more

Tool-using LLM agents increasingly coordinate real workloads by selecting and chaining third-party tools based on text-visible metadata such as tool...

2 months ago cs.CL cs.CR PDF

Attack MEDIUM

MCPShield: A Security Cognition Layer for Adaptive Trust Calibration in Model Context Protocol Agents

Zhenhong Zhou, Yuanhe Zhang, Hongwei Cai +6 more

The Model Context Protocol (MCP) standardizes tool use for LLM-based agents and enable third-party servers. This openness introduces a security...

2 months ago cs.CR cs.CL PDF

Attack MEDIUM

Anticipating Adversary Behavior in DevSecOps Scenarios through Large Language Models

Mario Marín Caballero, Miguel Betancourt Alonso, Daniel Díaz-López +3 more

The most valuable asset of any cloud-based organization is data, which is increasingly exposed to sophisticated cyberattacks. Until recently, the...

2 months ago cs.CR cs.AI PDF

Attack MEDIUM

OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage

Akshat Naik, Jay Culligan, Yarin Gal +4 more

As Large Language Model (LLM) agents become more capable, their coordinated use in the form of multi-agent systems is anticipated to emerge as a...

2 months ago cs.AI PDF

Attack MEDIUM

In-Context Autonomous Network Incident Response: An End-to-End Large Language Model Agent Approach

Yiran Gao, Kim Hammar, Tao Li

Rapidly evolving cyberattacks demand incident response systems that can autonomously learn and adapt to changing threats. Prior work has extensively...

2 months ago cs.CR cs.AI PDF

Attack MEDIUM

TensorCommitments: A Lightweight Verifiable Inference for Language Models

Oguzhan Baser, Elahe Sadeghi, Eric Wang +5 more

Most large language models (LLMs) run on external clouds: users send a prompt, pay for inference, and must trust that the remote GPU executes the LLM...

2 months ago cs.CR cs.AI PDF

Attack MEDIUM

Vulnerabilities in Partial TEE-Shielded LLM Inference with Precomputed Noise

Abhishek Saini, Haolin Jiang, Hang Liu

The deployment of large language models (LLMs) on third-party devices requires new ways to protect model intellectual property. While Trusted...

3 months ago cs.CR cs.AR PDF

Attack MEDIUM

A Behavioral Fingerprint for Large Language Models: Provenance Tracking via Refusal Vectors

Zhenyu Xu, Victor S. Sheng

Protecting the intellectual property of large language models (LLMs) is a critical challenge due to the proliferation of unauthorized derivative...

3 months ago cs.CR cs.AI PDF

Attack MEDIUM

LLMs + Security = Trouble

Benjamin Livshits

We argue that when it comes to producing secure code with AI, the prevailing "fighting fire with fire" approach -- using probabilistic AI-based...

3 months ago cs.CR cs.AI cs.SE PDF

Attack MEDIUM

Reverse-Engineering Model Editing on Language Models

Zhiyu Sun, Minrui Luo, Yu Wang +2 more

Large language models (LLMs) are pretrained on corpora containing trillions of tokens and, therefore, inevitably memorize sensitive information....

3 months ago cs.CR cs.AI cs.CL PDF

Attack MEDIUM

AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management

Ruoyao Wen, Hao Li, Chaowei Xiao +1 more

Indirect prompt injection threatens LLM agents by embedding malicious instructions in external content, enabling unauthorized actions and data theft....

3 months ago cs.CR cs.AI PDF

Attack MEDIUM

AEGIS: Adversarial Target-Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models

Fengpeng Li, Kemou Li, Qizhou Wang +2 more

Concept erasure helps stop diffusion models (DMs) from generating harmful content; but current methods face robustness retention trade off....

3 months ago cs.LG cs.AI cs.CR PDF

Attack MEDIUM

Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification

Tao Huang, Rui Wang, Xiaofei Liu +3 more

%Large vision-language models (LVLMs) have shown substantial advances in multimodal understanding and generation. However, when presented with...

3 months ago cs.LG PDF

Attack MEDIUM

Comparative Insights on Adversarial Machine Learning from Industry and Academia: A User-Study Approach

Vishruti Kakkad, Paul Chung, Hanan Hibshi +1 more

An exponential growth of Machine Learning and its Generative AI applications brings with it significant security challenges, often referred to as...

3 months ago cs.CR cs.AI PDF

Attack MEDIUM

LiteToken: Removing Intermediate Merge Residues From BPE Tokenizers

Yike Sun, Haotong Yang, Zhouchen Lin +1 more

Tokenization is fundamental to how language models represent and process text, yet the behavior of widely used BPE tokenizers has received far less...

3 months ago cs.CL PDF

Attack MEDIUM

Inference-Time Backdoors via Hidden Instructions in LLM Chat Templates

Ariel Fogel, Omer Hofman, Eilon Cohen +1 more

Open-weight language models are increasingly used in production settings, raising new security challenges. One prominent threat in this context is...

3 months ago cs.CR cs.LG PDF

Attack MEDIUM

A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness

Leo Schwinn, Moritz Ladenburger, Tim Beyer +3 more

Automated \enquote{LLM-as-a-Judge} frameworks have become the de facto standard for scalable evaluation across natural language processing. For...

3 months ago cs.CL cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial