TrojanPraise: Jailbreak LLMs via Benign Fine-Tuning
Zhixin Xie, Xurui Song, Jun Luo
The demand of customized large language models (LLMs) has led to commercial LLMs offering black-box fine-tuning APIs, yet this convenience introduces...
2,583+ academic papers on AI security, attacks, and defenses
Showing 501–520 of 994 papers
Clear filtersZhixin Xie, Xurui Song, Jun Luo
The demand of customized large language models (LLMs) has led to commercial LLMs offering black-box fine-tuning APIs, yet this convenience introduces...
Anirudh Sekar, Mrinal Agarwal, Rachel Sharma +4 more
Prompt injection attacks have become an increasing vulnerability for LLM applications, where adversarial prompts exploit indirect input channels such...
János Kramár, Joshua Engels, Zheng Wang +4 more
Frontier language model capabilities are improving rapidly. We thus need stronger mitigations against bad actors misusing increasingly powerful...
Marco Arazzi, Antonino Nocera
Backdoored and privacy-leaking deep neural networks pose a serious threat to the deployment of machine learning systems in security-critical...
Aiman Al Masoud, Marco Arazzi, Antonino Nocera
Retrieval-Augmented Generation (RAG) has attracted significant attention due to its ability to combine the generative capabilities of Large Language...
Yipu Dou, Wang Yang
Large language model (LLM) safety evaluation is moving from content moderation to action security as modern systems gain persistent state, tool...
Chetan Pathade, Vinod Dhimam, Sheheryar Ahmad +1 more
Serverless computing has achieved widespread adoption, with over 70% of AWS organizations using serverless solutions [1]. Meanwhile, machine learning...
Yinzhi Zhao, Ming Wang, Shi Feng +3 more
Large language models (LLMs) have achieved impressive performance across natural language tasks and are increasingly deployed in real-world...
Christina Lu, Jack Gallagher, Jonathan Michala +2 more
Large language models can represent a variety of personas but typically default to a helpful Assistant identity cultivated during post-training. We...
Luoming Hu, Jingjie Zeng, Liang Yang +1 more
Enhancing the moral alignment of Large Language Models (LLMs) is a critical challenge in AI safety. Current alignment techniques often act as...
Yuansen Liu, Yixuan Tang, Anthony Kum Hoe Tun
Current LLM safety research predominantly focuses on mitigating Goal Hijacking, preventing attackers from redirecting a model's high-level objective...
Murat Bilgehan Ertan, Marten van Dijk
Differentially Private Stochastic Gradient Descent (DP-SGD) is the dominant paradigm for private training, but its fundamental limitations under...
Hao Li, Yankai Yang, G. Edward Suh +2 more
Large Language Models (LLMs) have enabled the development of powerful agentic systems capable of automating complex workflows across various fields....
Oleg Brodt, Elad Feldman, Bruce Schneier +1 more
Prompt injection was initially framed as the large language model (LLM) analogue of SQL injection. However, over the past three years, attacks...
Zhiyi Mou, Jingyuan Yang, Zeheng Qian +6 more
While Large Language Models (LLMs) have powerful capabilities, they remain vulnerable to jailbreak attacks, which is a critical barrier to their safe...
Feng Zhang, Shijia Li, Chunmao Zhang +7 more
User simulators serve as the critical interactive environment for agent post-training, and an ideal user simulator generalizes across domains and...
Xiaonan Liu, Zhihao Li, Xiao Lan +3 more
Capture-the-Flag (CTF) competitions play a central role in modern cybersecurity as a platform for training practitioners and evaluating offensive and...
Fengchao Chen, Tingmin Wu, Van Nguyen +1 more
Large Language Models (LLMs) have enabled agents to move beyond conversation toward end-to-end task execution and become more helpful. However, this...
Renyang Liu, Kangjie Chen, Han Qiu +4 more
Image generation models (IGMs), while capable of producing impressive and creative content, often memorize a wide range of undesirable concepts from...
Abdelaziz Bounhar, Rania Hossam Elmohamady Elbadry, Hadi Abdine +3 more
Steering Large Language Models (LLMs) through activation interventions has emerged as a lightweight alternative to fine-tuning for alignment and...
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial