Mechanistic Origin of Moral Indifference in Language Models
Lingyu Li, Yan Teng, Yingchun Wang
Existing behavioral alignment techniques for Large Language Models (LLMs) often neglect the discrepancy between surface compliance and internal...
2,077+ academic papers on AI security, attacks, and defenses
Showing 81–100 of 2,055 papers
Clear filtersLingyu Li, Yan Teng, Yingchun Wang
Existing behavioral alignment techniques for Large Language Models (LLMs) often neglect the discrepancy between surface compliance and internal...
Yihao Zhang, Zeming Wei, Xiaokun Luan +7 more
Autonomous LLM-based agents increasingly operate as long-running processes forming densely interconnected multi-agent ecosystems, whose security...
Yihao Zhang, Zeming Wei, Xiaokun Luan +7 more
Autonomous LLM-based agents increasingly operate as long-running processes forming densely interconnected multi-agent ecosystems, whose security...
Trishita Dhara, Siddhesh Sheth
Large language models are increasingly deployed in settings where relevant information is embedded within long and noisy contexts. Despite this,...
Zhenheng Tang, Xiang Liu, Qian Wang +3 more
As Large Language Models (LLMs) become more powerful and autonomous, they increasingly face conflicts and dilemmas in many scenarios. We first...
Simone Aonzo, Merve Sahin, Aurélien Francillon +1 more
Artificial intelligence (AI) systems are increasingly adopted as tool-using agents that can plan, observe their environment, and take actions over...
Taeyun Roh, Wonjune Jang, Junha Jung +1 more
Large language model agents heavily rely on external memory to support knowledge reuse and complex reasoning tasks. Yet most memory systems store...
Vanshaj Khattar, Md Rafi ur Rashid, Moumita Choudhury +4 more
Test-time training (TTT) has recently emerged as a promising method to improve the reasoning abilities of large language models (LLMs), in which the...
Kai Wang, Biaojie Zeng, Zeming Wei +7 more
With the rapid development of LLM-based multi-agent systems (MAS), their significant safety and security concerns have emerged, which introduce novel...
Yu Pan, Wenlong Yu, Tiejun Wu +4 more
Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks. However, they remain highly susceptible to...
Mateusz Dziemian, Maxwell Lin, Xiaohan Fu +28 more
LLM based agents are increasingly deployed in high stakes settings where they process external data sources such as emails, documents, and code...
Ye Wang, Jing Liu, Toshiaki Koike-Akino
The safety and reliability of vision-language models (VLMs) are a crucial part of deploying trustworthy agentic AI systems. However, VLMs remain...
Yuhuan Liu, Haitian Zhong, Xinyuan Xia +3 more
Large Language Models (LLMs) often suffer from catastrophic forgetting and collapse during sequential knowledge editing. This vulnerability stems...
Zhenlin Xu, Xiaogang Zhu, Yu Yao +2 more
Modern agentic systems allow Large Language Model (LLM) agents to tackle complex tasks through extensive tool usage, forming structured control flows...
Jinhu Qi, Yifan Li, Minghao Zhao +4 more
As agentic AI systems move beyond static question answering into open-ended, tool-augmented, and multi-step real-world workflows, their increased...
Zhuoshang Wang, Yubing Ren, Yanan Cao +3 more
While watermarking serves as a critical mechanism for LLM provenance, existing secret-key schemes tightly couple detection with injection, requiring...
Yewon Han, Yumin Seol, EunGyung Kong +2 more
Existing jailbreak defence frameworks for Large Vision-Language Models often suffer from a safety utility tradeoff, where strengthening safety...
Ruyi Zhang, Heng Gao, Songlei Jian +2 more
Backdoor attacks compromise model reliability by using triggers to manipulate outputs. Trigger inversion can accurately locate these triggers via a...
Lidor Erez, Omer Hofman, Tamir Nizri +1 more
Automated LLM vulnerability scanners are increasingly used to assess security risks by measuring different attack type success rates (ASR). Yet the...
Andrew Seohwan Yu, Mohsen Hariri, Kunio Nakamura +3 more
Vision language models (VLMs) have shown significant promise in visual grounding for images as well as videos. In medical imaging research, VLMs...
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial