Read the Scene, Not the Script: Outcome-Aware Safety for LLMs
Rui Wu, Yihao Quan, Zeru Shi +3 more
Safety-aligned Large Language Models (LLMs) still show two dominant failure modes: they are easily jailbroken, or they over-refuse harmless inputs...
2,529+ academic papers on AI security, attacks, and defenses
Showing 1121–1140 of 1,207 papers
Clear filtersRui Wu, Yihao Quan, Zeru Shi +3 more
Safety-aligned Large Language Models (LLMs) still show two dominant failure modes: they are easily jailbroken, or they over-refuse harmless inputs...
Rijha Safdar, Danyail Mateen, Syed Taha Ali +1 more
Artificial Intelligence (AI) and more specifically Large Language Models (LLMs) have demonstrated exceptional progress in multiple areas including...
Guangyu Shen, Siyuan Cheng, Xiangzhe Xu +4 more
Large Language Models (LLMs) can acquire deceptive behaviors through backdoor attacks, where the model executes prohibited actions whenever secret...
Jehyeok Yeon, Isha Chaudhary, Gagandeep Singh
Large language models (LLMs) are increasingly deployed in agentic systems where they map user intents to relevant external tools to fulfill a task. A...
Chengxiao Wang, Isha Chaudhary, Qian Hu +3 more
Large Language Models (LLMs) can produce catastrophic responses in conversational settings that pose serious risks to public safety and security....
Hangting Ye, Jinmeng Li, He Zhao +4 more
Existing anomaly detection (AD) methods for tabular data usually rely on some assumptions about anomaly patterns, leading to inconsistent performance...
Bumjun Kim, Dongjae Jeon, Dueun Kim +2 more
Diffusion large language models (dLLMs) have emerged as a promising alternative to autoregressive models, offering flexible generation orders and...
Tanqiu Jiang, Min Bai, Nikolaos Pappas +2 more
Vision-language model (VLM)-based web agents increasingly power high-stakes selection tasks like content recommendation or product ranking by...
Fatmazohra Rezkellah, Ramzi Dakhmouche
With the increasing adoption of Large Language Models (LLMs), more customization is needed to ensure privacy-preserving and safe generation. We...
Kartik Pandit, Sourav Ganguly, Arnesh Banerjee +2 more
Ensuring safety is a foundational requirement for large language models (LLMs). Achieving an appropriate balance between enhancing the utility of...
Imene Kerboua, Sahar Omidi Shayegan, Megh Thakkar +7 more
Web agents powered by large language models (LLMs) must process lengthy web page observations to complete user goals; these pages often exceed tens...
Léo Boisvert, Abhay Puri, Chandra Kiran Reddy Evuru +6 more
While finetuning AI agents on interaction data -- such as web browsing or tool use -- improves their capabilities, it also introduces critical...
Nikoo Naghavian, Mostafa Tavassolipour
Vision-language models like CLIP demonstrate impressive zero-shot generalization but remain highly vulnerable to adversarial attacks. In this work,...
Abrar Shahid, Ibteeker Mahir Ishum, AKM Tahmidul Haque +2 more
This paper presents a controlled study of adversarial reinforcement learning in network security through a custom OpenAI Gym environment that models...
Lesly Miculicich, Mihir Parmar, Hamid Palangi +4 more
The deployment of autonomous AI agents in sensitive domains, such as healthcare, introduces critical risks to safety, security, and privacy. These...
Bowei Ning, Xuejun Zong, Kan He
Industrial control systems (ICS) are vital to modern infrastructure but increasingly vulnerable to cybersecurity threats, particularly through...
Chenpei Huang, Lingfeng Yao, Hui Zhong +5 more
Ear canal scanning/sensing (ECS) has emerged as a novel biometric authentication method for mobile devices paired with wireless earbuds. Existing...
Yuhao Sun, Zhuoer Xu, Shiwen Cui +4 more
Large Language Models (LLMs) have achieved remarkable progress across a wide range of tasks, but remain vulnerable to safety risks such as harmful...
Davide Gabrielli, Simone Sestito, Iacopo Masi
The current landscape of defensive mechanisms for LLMs is fragmented and underdeveloped, unlike prior work on classifiers. To further promote...
Zhaoyan Wang, Zheng Gao, Arogya Kharel +1 more
Graph Neural Networks (GNNs) are widely adopted in Web-related applications, serving as a core technique for learning from graph-structured data,...
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial