Factor(T,U): Factored Cognition Strengthens Monitoring of Untrusted AI
Aaron Sandoval, Cody Rushing
The field of AI Control seeks to develop robust control protocols, deployment safeguards for untrusted AI which may be intentionally subversive....
2,077+ academic papers on AI security, attacks, and defenses
Showing 481–500 of 791 papers
Clear filtersAaron Sandoval, Cody Rushing
The field of AI Control seeks to develop robust control protocols, deployment safeguards for untrusted AI which may be intentionally subversive....
Haowei Fu, Bo Ni, Han Xu +3 more
Retrieval-Augmented Generation (RAG) and Supervised Finetuning (SFT) have become the predominant paradigms for equipping Large Language Models (LLMs)...
Adeela Bashir, The Anh han, Zia Ush Shamszaman
The integration of large language models (LLMs) into healthcare IoT systems promises faster decisions and improved medical support. LLMs are also...
Omar Farooq Khan Suri, John McCrae
Large Language Models (LLMs) are increasingly being deployed in real-world applications, but their flexibility exposes them to prompt injection...
Zihao Wang, Kar Wai Fok, Vrizlynn L. L. Thing
Multi-modal large language models (MLLMs), capable of processing text, images, and audio, have been widely adopted in various AI applications....
Mintong Kang, Chong Xiang, Sanjay Kariyappa +3 more
Indirect prompt injection attacks (IPIAs), where large language models (LLMs) follow malicious instructions hidden in input data, pose a critical...
Hao Wu, Prateek Saxena
This paper explores attacks and defenses on vector databases in retrieval-augmented generation (RAG) systems. Prior work on knowledge poisoning...
K. J. Kevin Feng, Tae Soo Kim, Rock Yuren Pang +3 more
AI agents that take actions in their environment autonomously over extended time horizons require robust governance interventions to curb their...
Haoyu Shen, Weimin Lyu, Haotian Xu +1 more
Vision-Language Models (VLMs) have achieved impressive progress in multimodal text generation, yet their rapid adoption raises increasing concerns...
Mohammad M Maheri, Xavier Cadet, Peter Chin +1 more
Approximate machine unlearning aims to efficiently remove the influence of specific data points from a trained model, offering a practical...
Tong Wu, Weibin Wu, Zibin Zheng
Equipped with various tools and knowledge, GPTs, one kind of customized AI agents based on OpenAI's large language models, have illustrated great...
Zeng Wang, Minghao Shao, Akashdeep Saha +4 more
Graph neural networks (GNNs) have shown promise in hardware security by learning structural motifs from netlist graphs. However, this reliance on...
Richard J. Young
Large Language Model (LLM) safety guardrail models have emerged as a primary defense mechanism against harmful content generation, yet their...
Tianyu Zhang, Zihang Xi, Jingyu Hua +1 more
In the realm of black-box jailbreak attacks on large language models (LLMs), the feasibility of constructing a narrow safety proxy, a lightweight...
Herman Errico, Jiquan Ngiam, Shanita Sojan
The Model Context Protocol (MCP) replaces static, developer-controlled API integrations with more dynamic, user-driven agent systems, which also...
Kaiyuan Zhang, Mark Tenenholtz, Kyle Polley +3 more
The integration of artificial intelligence (AI) agents into web browsers introduces security challenges that go beyond traditional web application...
Jakub Hoscilowicz, Artur Janicki
We introduce the Adversarial Confusion Attack, a new class of threats against multimodal large language models (MLLMs). Unlike jailbreaks or targeted...
Sidahmed Benabderrahmane, James Cheney, Talal Rahwan
Advanced Persistent Threats (APTs) pose a significant challenge in cybersecurity due to their stealthy and long-term nature. Modern supervised...
Sen Nie, Jie Zhang, Jianxin Yan +2 more
Adversarial attacks have evolved from simply disrupting predictions on conventional task-specific models to the more complex goal of manipulating...
Steven Peh
Large Language Models (LLMs) remain vulnerable to prompt injection attacks, representing the most significant security threat in production...
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial