Intent-obfuscation-based jailbreak attacks on multimodal large language models (MLLMs) transform a harmful query into a concealed multimodal input to...
Wesley Hanwen Deng, Mingxi Yan, Sunnie S. Y. Kim +5 more
Recent developments in AI safety research have called for red-teaming methods that effectively surface potential risks posed by generative AI models,...
Retrieval-Augmented Generation (RAG) systems are vulnerable to knowledge base poisoning, yet existing attacks have been evaluated almost exclusively...
While Multimodal Large Language Models (MLLMs) are increasingly integrated with Retrieval-Augmented Generation (RAG) to mitigate hallucinations, the...
LLM watermarks must be detectable without compromising text quality, yet most existing schemes bias the next-token distribution and pay for detection...
Yiwei Zhang, Jeremiah Birrell, Reza Ebrahimi +3 more
Large language models (LLMs) remain vulnerable to adversarial prompting despite advances in alignment and safety, often exhibiting harmful behaviors...
Large language models (LLMs) are known to be vulnerable to jailbreak attacks, which typically rely on carefully designed prompts containing explicit...
Large Language Models (LLMs) have achieved remarkable success but remain highly susceptible to jailbreak attacks, in which adversarial prompts coerce...
Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera +2 more
The open-source ecosystem has accelerated the democratization of Large Language Models (LLMs) through the public distribution of specialized Low-Rank...
AI agents are increasingly deployed across diverse domains to automate complex workflows through long-horizon and high-stakes action executions. Due...