AI Security Research

2,583+ academic papers on AI security, attacks, and defenses

Total

2,583

Attack

994

Benchmark

740

Defense

355

Tool

275

Survey

146

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 601–620 of 1,930 papers

Clear filters

Benchmark MEDIUM

Beyond Benchmark Islands: Toward Representative Trustworthiness Evaluation for Agentic AI

Jinhu Qi, Yifan Li, Minghao Zhao +4 more

As agentic AI systems move beyond static question answering into open-ended, tool-augmented, and multi-step real-world workflows, their increased...

2 months ago cs.CL cs.DB PDF

Tool MEDIUM

Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework

Zhuoshang Wang, Yubing Ren, Yanan Cao +3 more

While watermarking serves as a critical mechanism for LLM provenance, existing secret-key schemes tightly couple detection with injection, requiring...

2 months ago cs.CR cs.CL PDF

Defense MEDIUM

Two Birds, One Projection: Harmonizing Safety and Utility in LVLMs via Inference-time Feature Projection

Yewon Han, Yumin Seol, EunGyung Kong +2 more

Existing jailbreak defence frameworks for Large Vision-Language Models often suffer from a safety utility tradeoff, where strengthening safety...

2 months ago cs.CV cs.AI PDF

Attack MEDIUM

BadLLM-TG: A Backdoor Defender powered by LLM Trigger Generator

Ruyi Zhang, Heng Gao, Songlei Jian +2 more

Backdoor attacks compromise model reliability by using triggers to manipulate outputs. Trigger inversion can accurately locate these triggers via a...

2 months ago cs.CR cs.AI PDF

Benchmark HIGH

When Scanners Lie: Evaluator Instability in LLM Red-Teaming

Lidor Erez, Omer Hofman, Tamir Nizri +1 more

Automated LLM vulnerability scanners are increasingly used to assess security risks by measuring different attack type success rates (ASR). Yet the...

2 months ago cs.CR cs.PF PDF

Benchmark LOW

Medical Image Spatial Grounding with Semantic Sampling

Andrew Seohwan Yu, Mohsen Hariri, Kunio Nakamura +3 more

Vision language models (VLMs) have shown significant promise in visual grounding for images as well as videos. In medical imaging research, VLMs...

2 months ago cs.CV cs.LG PDF

Benchmark LOW

Medical Image Spatial Grounding with Semantic Sampling

Andrew Seohwan Yu, Mohsen Hariri, Kunio Nakamura +3 more

Vision language models (VLMs) have shown significant promise in visual grounding for images as well as videos. In medical imaging research, VLMs...

2 months ago cs.CV cs.LG PDF

Defense LOW

Questionnaire Responses Do not Capture the Safety of AI Agents

Max Hellrigel-Holderbaum, Edward James Young

As AI systems advance in capabilities, measuring their safety and alignment to human values is becoming paramount. A fast-growing field of AI...

2 months ago cs.CY cs.AI cs.CL PDF

Benchmark LOW

HomeGuard: VLM-based Embodied Safeguard for Identifying Contextual Risk in Household Task

Xiaoya Lu, Yijin Zhou, Zeren Chen +6 more

Vision-Language Models (VLMs) empower embodied agents to execute complex instructions, yet they remain vulnerable to contextual safety risks where...

2 months ago cs.CV PDF

Defense MEDIUM

Exposing Long-Tail Safety Failures in Large Language Models through Efficient Diverse Response Sampling

Suvadeep Hajra, Palash Nandi, Tanmoy Chakraborty

Safety tuning through supervised fine-tuning and reinforcement learning from human feedback has substantially improved the robustness of large...

2 months ago cs.CL PDF

Defense MEDIUM

State-Dependent Safety Failures in Multi-Turn Language Model Interaction

Pengcheng Li, Jie Zhang, Tianwei Zhang +5 more

Safety alignment in large language models is typically evaluated under isolated queries, yet real-world use is inherently multi-turn. Although...

2 months ago cs.CR cs.AI PDF

Tool MEDIUM

Governing Dynamic Capabilities: Cryptographic Binding and Reproducibility Verification for AI Agent Tool Use

Ziling Zhou

AI agents dynamically acquire capabilities at runtime via MCP and A2A, yet no framework detects when capabilities change post-authorization. We term...

2 months ago cs.CR PDF

Tool MEDIUM

Governing Dynamic Capabilities: Cryptographic Binding and Reproducibility Verification for AI Agent Tool Use

Ziling Zhou

AI agents dynamically acquire tools, orchestrate sub-agents, and transact across organizational boundaries, yet no existing security layer verifies...

2 months ago cs.CR PDF

Other LOW

Enhancing LLM Training via Spectral Clipping

Xiaowen Jiang, Andrei Semenov, Sebastian U. Stich

While spectral-based optimizers like Muon operate directly on the spectrum of updates, standard adaptive methods such as AdamW do not account for the...

2 months ago cs.LG math.OC PDF

Attack HIGH

Activation Surgery: Jailbreaking White-box LLMs without Touching the Prompt

Maël Jenny, Jérémie Dentan, Sonia Vanier +1 more

Most jailbreak techniques for Large Language Models (LLMs) primarily rely on prompt modifications, including paraphrasing, obfuscation, or...

2 months ago cs.CR PDF

Attack HIGH

Safety-Potential Pruning for Enhancing Safety Prompts Against VLM Jailbreaking Without Retraining

Chongxin Li, Hanzhang Wang, Lian Duan

Safety prompts constitute an interpretable layer of defense against jailbreak attacks in vision-language models (VLMs); however, their efficacy is...

2 months ago cs.CV PDF

Benchmark MEDIUM

Clinician input steers frontier AI models toward both accurate and harmful decisions

Ivan Lopez, Selin S. Everett, Bryan J. Bunning +10 more

Large language models (LLMs) are entering clinician workflows, yet evaluations rarely measure how clinician reasoning shapes model behavior during...

2 months ago cs.HC cs.LG PDF

Attack HIGH

GroupGuard: A Framework for Modeling and Defending Collusive Attacks in Multi-Agent Systems

Yiling Tao, Xinran Zheng, Shuo Yang +2 more

While large language model-based agents demonstrate great potential in collaborative tasks, their interactivity also introduces security...

2 months ago cs.AI PDF

Attack HIGH

Sirens' Whisper: Inaudible Near-Ultrasonic Jailbreaks of Speech-Driven LLMs

Zijian Ling, Pingyi Hu, Xiuyong Gao +6 more

Speech-driven large language models (LLMs) are increasingly accessed through speech interfaces, introducing new security risks via open acoustic...

2 months ago cs.CR cs.AI cs.SD PDF

Survey LOW

Privacy-Preserving Machine Learning for IoT: A Cross-Paradigm Survey and Future Roadmap

Zakia Zaman, Praveen Gauravaram, Mahbub Hassan +2 more

The rapid proliferation of the Internet of Things has intensified demand for robust privacy-preserving machine learning mechanisms to safeguard...

2 months ago cs.LG cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial