AJAR: Adaptive Jailbreak Architecture for Red-teaming
Yipu Dou, Wang Yang
Large language model (LLM) safety evaluation is moving from content moderation to action security as modern systems gain persistent state, tool...
2,077+ academic papers on AI security, attacks, and defenses
Showing 321–340 of 809 papers
Clear filtersYipu Dou, Wang Yang
Large language model (LLM) safety evaluation is moving from content moderation to action security as modern systems gain persistent state, tool...
Chetan Pathade, Vinod Dhimam, Sheheryar Ahmad +1 more
Serverless computing has achieved widespread adoption, with over 70% of AWS organizations using serverless solutions [1]. Meanwhile, machine learning...
Yinzhi Zhao, Ming Wang, Shi Feng +3 more
Large language models (LLMs) have achieved impressive performance across natural language tasks and are increasingly deployed in real-world...
Christina Lu, Jack Gallagher, Jonathan Michala +2 more
Large language models can represent a variety of personas but typically default to a helpful Assistant identity cultivated during post-training. We...
Luoming Hu, Jingjie Zeng, Liang Yang +1 more
Enhancing the moral alignment of Large Language Models (LLMs) is a critical challenge in AI safety. Current alignment techniques often act as...
Yuansen Liu, Yixuan Tang, Anthony Kum Hoe Tun
Current LLM safety research predominantly focuses on mitigating Goal Hijacking, preventing attackers from redirecting a model's high-level objective...
Murat Bilgehan Ertan, Marten van Dijk
Differentially Private Stochastic Gradient Descent (DP-SGD) is the dominant paradigm for private training, but its fundamental limitations under...
Hao Li, Yankai Yang, G. Edward Suh +2 more
Large Language Models (LLMs) have enabled the development of powerful agentic systems capable of automating complex workflows across various fields....
Oleg Brodt, Elad Feldman, Bruce Schneier +1 more
Prompt injection was initially framed as the large language model (LLM) analogue of SQL injection. However, over the past three years, attacks...
Zhiyi Mou, Jingyuan Yang, Zeheng Qian +6 more
While Large Language Models (LLMs) have powerful capabilities, they remain vulnerable to jailbreak attacks, which is a critical barrier to their safe...
Feng Zhang, Shijia Li, Chunmao Zhang +7 more
User simulators serve as the critical interactive environment for agent post-training, and an ideal user simulator generalizes across domains and...
Xiaonan Liu, Zhihao Li, Xiao Lan +3 more
Capture-the-Flag (CTF) competitions play a central role in modern cybersecurity as a platform for training practitioners and evaluating offensive and...
Fengchao Chen, Tingmin Wu, Van Nguyen +1 more
Large Language Models (LLMs) have enabled agents to move beyond conversation toward end-to-end task execution and become more helpful. However, this...
Renyang Liu, Kangjie Chen, Han Qiu +4 more
Image generation models (IGMs), while capable of producing impressive and creative content, often memorize a wide range of undesirable concepts from...
Abdelaziz Bounhar, Rania Hossam Elmohamady Elbadry, Hadi Abdine +3 more
Steering Large Language Models (LLMs) through activation interventions has emerged as a lightweight alternative to fine-tuning for alignment and...
Mohammed Himayath Ali, Mohammed Aqib Abdullah, Mohammed Mudassir Uddin +1 more
Large Language Models have emerged as transformative tools for Security Operations Centers, enabling automated log analysis, phishing triage, and...
Ruiqi Li, Zhiqiang Wang, Yunhao Yao +1 more
To standardize interactions between LLM-based agents and their environments, the Model Context Protocol (MCP) was proposed and has since been widely...
Xinyi Wu, Geng Hong, Yueyue Chen +5 more
Web agents, powered by large language models (LLMs), are increasingly deployed to automate complex web interactions. The rise of open-source...
Shawn Li, Chenxiao Yu, Zhiyu Ni +4 more
Large language models (LLMs) are increasingly deployed in security-sensitive applications, where they must follow system- or developer-specified...
Muhammad Wahid Akram, Keshav Sood, Muneeb Ul Hassan +1 more
Phishing with Quick Response (QR) codes is termed as Quishing. The attackers exploit this method to manipulate individuals into revealing their...
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial