Capability-Oriented Training Induced Alignment Risk
Yujun Zhou, Yue Huang, Han Bao +8 more
While most AI alignment research focuses on preventing models from generating explicitly harmful content, a more subtle risk is emerging:...
2,529+ academic papers on AI security, attacks, and defenses
Showing 81–100 of 222 papers
Clear filtersYujun Zhou, Yue Huang, Han Bao +8 more
While most AI alignment research focuses on preventing models from generating explicitly harmful content, a more subtle risk is emerging:...
Christian Rondanini, Barbara Carminati, Elena Ferrari +2 more
The proliferation of edge devices has created an urgent need for security solutions capable of detecting malware in real time while operating under...
Md Sazedur Rahman, Mizanur Rahman Jewel, Sanjay Madria
Mining is rapidly evolving into an AI driven cyber physical ecosystem where safety and operational reliability depend on robust perception,...
Adel ElZemity, Joshua Sylvester, Budi Arief +1 more
SMS-based phishing (smishing) attacks have surged, yet training effective on-device detectors requires labelled threat data that quickly becomes...
Enrico Ahlers, Daniel Passon, Yannic Noller +1 more
Machine learning models are increasingly present in our everyday lives; as a result, they become targets of adversarial attackers seeking to...
Zijing Xu, Ziwei Ning, Tiancheng Hu +4 more
The rapid evolution of cyber threats has highlighted significant gaps in security knowledge integration. Cybersecurity Knowledge Graphs (CKGs)...
Weichen Yu, Ravi Mangal, Yinyi Luo +4 more
Large Language Models are rapidly becoming core components of modern software development workflows, yet ensuring code security remains challenging....
Kun Wang, Zherui Li, Zhenhong Zhou +8 more
Omni-modal Large Language Models (OLLMs) greatly expand LLMs' multimodal capabilities but also introduce cross-modal safety risks. However, a...
Oliver Daniels, Perusha Moodley, Benjamin M. Marlin +1 more
Alignment audits aim to robustly identify hidden goals from strategic, situationally aware misaligned models. Despite this threat model, existing...
Yu Fu, Haz Sameen Shahgir, Huanli Gong +3 more
Large language models (LLMs) increasingly combine long-context processing with advanced reasoning, enabling them to retrieve and synthesize...
Yukun Jiang, Hai Huang, Mingjie Li +3 more
By introducing routers to selectively activate experts in Transformer layers, the mixture-of-experts (MoE) architecture significantly reduces...
Shayan Ali Hassan, Tao Ni, Zafar Ayyub Qazi +1 more
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding, reasoning, and generation. However, these...
Yunbei Zhang, Kai Mei, Ming Liu +5 more
We present the first large-scale empirical study of Moltbook, an AI-only social platform where 27,269 agents produced 137,485 posts and 345,580...
Chen Chen, Yuchen Sun, Jiaxin Gao +4 more
Large language models (LLMs) are increasingly deployed in security-sensitive applications, yet remain vulnerable to backdoor attacks. However,...
Hema Karnam Surendrababu, Nithin Nagaraj
Machine Learning (ML) models, including Large Language Models (LLMs), are characterized by a range of system-level attributes such as security and...
Rohan Subramanian Thomas, Shikhar Shiromani, Abdullah Chaudhry +4 more
Prompt design significantly impacts the moral competence and safety alignment of large language models (LLMs), yet empirical comparisons remain...
Zhenxiong Yu, Zhi Yang, Zhiheng Jin +19 more
As large language models (LLMs) evolve into autonomous agents, their real-world applicability has expanded significantly, accompanied by new security...
Jiacheng Liang, Yuhui Wang, Tanqiu Jiang +1 more
Mixture-of-Experts (MoE) language models introduce unique challenges for safety alignment due to their sparse routing mechanisms, which can enable...
Guang Yang, Xing Hu, Xiang Chen +1 more
Large language models (LLMs) for Verilog code generation are increasingly adopted in hardware design, yet remain vulnerable to backdoor attacks where...
Sidahmed Benabderrahmane, Petko Valtchev, James Cheney +1 more
Detecting rare and diverse anomalies in highly imbalanced datasets-such as Advanced Persistent Threats (APTs) in cybersecurity-remains a fundamental...
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial