Soft Instruction De-escalation Defense
Nils Philipp Walter, Chawin Sitawarin, Jamie Hayes +2 more
Large Language Models (LLMs) are increasingly deployed in agentic systems that interact with an external environment; this makes them susceptible to...
2,077+ academic papers on AI security, attacks, and defenses
Showing 1641–1660 of 2,077 papers
Nils Philipp Walter, Chawin Sitawarin, Jamie Hayes +2 more
Large Language Models (LLMs) are increasingly deployed in agentic systems that interact with an external environment; this makes them susceptible to...
Li An, Yujian Liu, Yepeng Liu +3 more
Watermarking has emerged as a promising solution for tracing and authenticating text generated by large language models (LLMs). A common approach to...
Nguyen Linh Bao Nguyen, Alsharif Abuadbba, Kristen Moore +1 more
The rapid advancement of generative models has enabled the creation of increasingly stealthy synthetic voices, commonly referred to as audio...
Zheng-Xin Yong, Stephen H. Bach
We discover a novel and surprising phenomenon of unintentional misalignment in reasoning language models (RLMs), which we call self-jailbreaking....
Soham Hans, Stacy Marsella, Sophia Hirschmann +1 more
Understanding adversarial behavior in cybersecurity has traditionally relied on high-level intelligence reports and manual interpretation of attack...
Austin Jia, Avaneesh Ramesh, Zain Shamsi +2 more
Retrieval-Augmented Generation (RAG) has emerged as the dominant architectural pattern to operationalize Large Language Model (LLM) usage in Cyber...
Antônio H. Ribeiro, David Vävinggren, Dave Zachariah +2 more
Adversarial training has emerged as a key technique to enhance model robustness against adversarial input perturbations. Many of the existing methods...
Ronghao Ni, Aidan Z. H. Yang, Min-Chien Hsu +5 more
Program analysis tools often produce large volumes of candidate vulnerability reports that require costly manual review, creating a practical...
Alyssa Gerhart, Balaji Iyangar
Adversarial attacks pose a severe risk to AI systems used in healthcare, capable of misleading models into dangerous misclassifications that can...
Wei Shao, Yuhao Wang, Rongguang He +2 more
Existing defence mechanisms have demonstrated significant effectiveness in mitigating rule-based Denial-of-Service (DoS) attacks, leveraging...
Mohamed Seif, Malcolm Egan, Andrea J. Goldsmith +1 more
AI-based sensing at wireless edge devices has the potential to significantly enhance Artificial Intelligence (AI) applications, particularly for...
Zhenghao Xu, Qin Lu, Qingru Zhang +9 more
Reward model (RM) plays a pivotal role in reinforcement learning with human feedback (RLHF) for aligning large language models (LLMs). However,...
Daniel Gilkarov, Ran Dubin
Pretrained deep learning model sharing holds tremendous value for researchers and enterprises alike. It allows them to apply deep learning by...
Chiyu Chen, Xinhao Song, Yunkai Chai +7 more
Vision-Language Models (VLMs) are increasingly deployed as autonomous agents to navigate mobile graphical user interfaces (GUIs). Operating in...
Wu Yichao, Wang Yirui, Ding Panpan +3 more
With the wide application of deep reinforcement learning (DRL) techniques in complex fields such as autonomous driving, intelligent manufacturing,...
Divyanshu Kumar, Shreyas Jena, Nitin Aravind Birur +3 more
Multimodal large language models (MLLMs) have achieved remarkable progress, yet remain critically vulnerable to adversarial attacks that exploit...
Yulong Chen, Yadong Liu, Jiawen Zhang +3 more
Large Language Models (LLMs), despite advances in safety alignment, remain vulnerable to jailbreak attacks designed to circumvent protective...
Wm. Matthew Kennedy, Cigdem Patlak, Jayraj Dave +10 more
AI systems have the potential to produce both benefits and harms, but without rigorous and ongoing adversarial evaluation, AI actors will struggle to...
Xin Lian, Kenneth D. Forbus
Despite the broad applicability of large language models (LLMs), their reliance on probabilistic inference makes them vulnerable to errors such as...
Tushar Nayan, Ziqi Zhang, Ruimin Sun
With the increasing deployment of Large Language Models (LLMs) on mobile and edge platforms, securing them against model extraction attacks has...
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial