AI Security Research

2,560+ academic papers on AI security, attacks, and defenses

Total

2,560

Attack

982

Benchmark

736

Defense

350

Tool

275

Survey

144

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 281–300 of 1,220 papers

Clear filters

Tool MEDIUM

Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework

Zhuoshang Wang, Yubing Ren, Yanan Cao +3 more

While watermarking serves as a critical mechanism for LLM provenance, existing secret-key schemes tightly couple detection with injection, requiring...

1 months ago cs.CR cs.CL PDF

Defense MEDIUM

Two Birds, One Projection: Harmonizing Safety and Utility in LVLMs via Inference-time Feature Projection

Yewon Han, Yumin Seol, EunGyung Kong +2 more

Existing jailbreak defence frameworks for Large Vision-Language Models often suffer from a safety utility tradeoff, where strengthening safety...

1 months ago cs.CV cs.AI PDF

Attack MEDIUM

BadLLM-TG: A Backdoor Defender powered by LLM Trigger Generator

Ruyi Zhang, Heng Gao, Songlei Jian +2 more

Backdoor attacks compromise model reliability by using triggers to manipulate outputs. Trigger inversion can accurately locate these triggers via a...

1 months ago cs.CR cs.AI PDF

Defense MEDIUM

Exposing Long-Tail Safety Failures in Large Language Models through Efficient Diverse Response Sampling

Suvadeep Hajra, Palash Nandi, Tanmoy Chakraborty

Safety tuning through supervised fine-tuning and reinforcement learning from human feedback has substantially improved the robustness of large...

1 months ago cs.CL PDF

Defense MEDIUM

State-Dependent Safety Failures in Multi-Turn Language Model Interaction

Pengcheng Li, Jie Zhang, Tianwei Zhang +5 more

Safety alignment in large language models is typically evaluated under isolated queries, yet real-world use is inherently multi-turn. Although...

1 months ago cs.CR cs.AI PDF

Tool MEDIUM

Governing Dynamic Capabilities: Cryptographic Binding and Reproducibility Verification for AI Agent Tool Use

Ziling Zhou

AI agents dynamically acquire capabilities at runtime via MCP and A2A, yet no framework detects when capabilities change post-authorization. We term...

1 months ago cs.CR PDF

Tool MEDIUM

Governing Dynamic Capabilities: Cryptographic Binding and Reproducibility Verification for AI Agent Tool Use

Ziling Zhou

AI agents dynamically acquire tools, orchestrate sub-agents, and transact across organizational boundaries, yet no existing security layer verifies...

1 months ago cs.CR PDF

Benchmark MEDIUM

Clinician input steers frontier AI models toward both accurate and harmful decisions

Ivan Lopez, Selin S. Everett, Bryan J. Bunning +10 more

Large language models (LLMs) are entering clinician workflows, yet evaluations rarely measure how clinician reasoning shapes model behavior during...

1 months ago cs.HC cs.LG PDF

Benchmark MEDIUM

CTI-REALM: Benchmark to Evaluate Agent Performance on Security Detection Rule Generation Capabilities

Arjun Chakraborty, Sandra Ho, Adam Cook +1 more

CTI-REALM (Cyber Threat Real World Evaluation and LLM Benchmarking) is a benchmark designed to evaluate AI agents' ability to interpret cyber threat...

2 months ago cs.CR PDF

Attack MEDIUM

MoEKD: Mixture-of-Experts Knowledge Distillation for Robust and High-Performing Compressed Code Models

Md. Abdul Awal, Mrigank Rochan, Chanchal K. Roy

Large language models for code have achieved strong performance across diverse software analytics tasks, yet their real-world adoption remains...

2 months ago cs.SE PDF

Attack MEDIUM

Purifying Generative LLMs from Backdoors without Prior Knowledge or Clean Reference

Jianwei Li, Jung-Eun Kim

Backdoor attacks pose severe security threats to large language models (LLMs), where a model behaves normally under benign inputs but produces...

2 months ago cs.CR cs.AI cs.LG PDF

Defense MEDIUM

FraudFox: Adaptable Fraud Detection in the Real World

Matthew Butler, Yi Fan, Christos Faloutsos

The proposed method (FraudFox) provides solutions to adversarial attacks in a resource constrained environment. We focus on questions like the...

2 months ago cs.CR cs.LG PDF

Benchmark MEDIUM

Test-Time Attention Purification for Backdoored Large Vision Language Models

Zhifang Zhang, Bojun Yang, Shuo He +5 more

Despite the strong multimodal performance, large vision-language models (LVLMs) are vulnerable during fine-tuning to backdoor attacks, where...

2 months ago cs.CV cs.CR PDF

Defense MEDIUM

Uncovering Security Threats and Architecting Defenses in Autonomous Agents: A Case Study of OpenClaw

Zonghao Ying, Xiao Yang, Siyang Wu +7 more

The rapid evolution of Large Language Models (LLMs) into autonomous, tool-calling agents has fundamentally altered the cybersecurity landscape....

2 months ago cs.CR PDF

Tool MEDIUM

ChainFuzzer: Greybox Fuzzing for Workflow-Level Multi-Tool Vulnerabilities in LLM Agents

Jiangrong Wu, Zitong Yao, Yuhong Nan +1 more

Tool-augmented LLM agents increasingly rely on multi-step, multi-tool workflows to complete real tasks. This design expands the attack surface,...

2 months ago cs.SE cs.CR PDF

Attack MEDIUM

Neural Gate: Mitigating Privacy Risks in LVLMs via Neuron-Level Gradient Gating

Xiangkui Cao, Jie Zhang, Meina Kan +2 more

Large Vision-Language Models (LVLMs) have shown remarkable potential across a wide array of vision-language tasks, leading to their adoption in...

2 months ago cs.CV PDF

Benchmark MEDIUM

Security Considerations for Artificial Intelligence Agents

Ninghui Li, Kaiyuan Zhang, Kyle Polley +1 more

This article, a lightly adapted version of Perplexity's response to NIST/CAISI Request for Information 2025-0035, details our observations and...

2 months ago cs.LG cs.AI cs.CR PDF

Attack MEDIUM

EmbTracker: Traceable Black-box Watermarking for Federated Language Models

Haodong Zhao, Jinming Hu, Yijie Bai +6 more

Federated Language Model (FedLM) allows a collaborative learning without sharing raw data, yet it introduces a critical vulnerability, as every...

2 months ago cs.CR PDF

Benchmark MEDIUM

Understanding LLM Behavior When Encountering User-Supplied Harmful Content in Harmless Tasks

Junjie Chu, Yiting Qu, Ye Leng +4 more

Large Language Models (LLMs) are increasingly trained to align with human values, primarily focusing on task level, i.e., refusing to execute...

2 months ago cs.CR cs.AI PDF

Tool MEDIUM

OpenClaw PRISM: A Zero-Fork, Defense-in-Depth Runtime Security Layer for Tool-Augmented LLM Agents

Frank Li

Tool-augmented LLM agents introduce security risks that extend beyond user-input filtering, including indirect prompt injection through fetched...

2 months ago cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial