Factor(T,U): Factored Cognition Strengthens Monitoring of Untrusted AI
Aaron Sandoval, Cody Rushing
The field of AI Control seeks to develop robust control protocols, deployment safeguards for untrusted AI which may be intentionally subversive....
2,104+ academic papers on AI security, attacks, and defenses
Showing 1301–1320 of 2,049 papers
Clear filtersAaron Sandoval, Cody Rushing
The field of AI Control seeks to develop robust control protocols, deployment safeguards for untrusted AI which may be intentionally subversive....
Haowei Fu, Bo Ni, Han Xu +3 more
Retrieval-Augmented Generation (RAG) and Supervised Finetuning (SFT) have become the predominant paradigms for equipping Large Language Models (LLMs)...
Zahra Mahdavi, Zahra Khodakaramimaghsoud, Hooman Khaloo +4 more
Large vision-language models (LVLMs) are now central to healthcare applications such as medical visual question answering and imaging report...
Patrick Herter, Vincent Ahlrichs, Ridvan Açilan +1 more
Fuzzing is a highly effective method for uncovering software vulnerabilities, but analyzing the resulting data typically requires substantial manual...
Adeela Bashir, The Anh han, Zia Ush Shamszaman
The integration of large language models (LLMs) into healthcare IoT systems promises faster decisions and improved medical support. LLMs are also...
Rongzhe Wei, Peizhi Niu, Xinjie Shen +7 more
Large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs. Existing approaches...
Xinyun Zhou, Xinfeng Li, Yinan Peng +9 more
Retrieval-Augmented Generation (RAG) systems are increasingly central to robust AI, enhancing large language model (LLM) faithfulness by...
Omar Farooq Khan Suri, John McCrae
Large Language Models (LLMs) are increasingly being deployed in real-world applications, but their flexibility exposes them to prompt injection...
Mihai Christodorescu, Earlence Fernandes, Ashish Hooda +11 more
In recent years, agentic artificial intelligence (AI) systems are becoming increasingly widespread. These systems allow agents to use various tools,...
Qingyuan Fei, Xin Liu, Song Li +4 more
Researchers have proposed numerous methods to detect vulnerabilities in JavaScript, especially those assisted by Large Language Models (LLMs)....
Zihao Wang, Kar Wai Fok, Vrizlynn L. L. Thing
Multi-modal large language models (MLLMs), capable of processing text, images, and audio, have been widely adopted in various AI applications....
Mintong Kang, Chong Xiang, Sanjay Kariyappa +3 more
Indirect prompt injection attacks (IPIAs), where large language models (LLMs) follow malicious instructions hidden in input data, pose a critical...
Jianxiang Zang, Yongda Wei, Ruxue Bai +5 more
Reliable reward models (RMs) are critical for ensuring the safe alignment of large language models (LLMs). However, current RM evaluation methods...
Cen Lu, Yung-Chen Tang, Andrea Cavallaro
Large Vision-Language Models (LVLMs) have shown impressive multimodal understanding capabilities, yet their robustness is poorly understood. In this...
Hao Wu, Prateek Saxena
This paper explores attacks and defenses on vector databases in retrieval-augmented generation (RAG) systems. Prior work on knowledge poisoning...
K. J. Kevin Feng, Tae Soo Kim, Rock Yuren Pang +3 more
AI agents that take actions in their environment autonomously over extended time horizons require robust governance interventions to curb their...
Haoyu Shen, Weimin Lyu, Haotian Xu +1 more
Vision-Language Models (VLMs) have achieved impressive progress in multimodal text generation, yet their rapid adoption raises increasing concerns...
Yongyu Wang
Graph Neural Networks (GNNs) have emerged as a dominant paradigm for learning on graph-structured data, thanks to their ability to jointly exploit...
Yining Yuan, Yifei Wang, Yichang Xu +3 more
This paper presents LLMBugScanner, a large language model (LLM) based framework for smart contract vulnerability detection using fine-tuning and...
Juan A. Wibowo, George C. Polyzos
Background: Autonomous agents powered by Large Language Models (LLMs) are driving a paradigm shift toward an "Internet of Agents" (IoA). While...
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial