Building Production-Ready Probes For Gemini
János Kramár, Joshua Engels, Zheng Wang +4 more
Frontier language model capabilities are improving rapidly. We thus need stronger mitigations against bad actors misusing increasingly powerful...
2,077+ academic papers on AI security, attacks, and defenses
Showing 101–120 of 259 papers
Clear filtersJános Kramár, Joshua Engels, Zheng Wang +4 more
Frontier language model capabilities are improving rapidly. We thus need stronger mitigations against bad actors misusing increasingly powerful...
Marco Arazzi, Antonino Nocera
Backdoored and privacy-leaking deep neural networks pose a serious threat to the deployment of machine learning systems in security-critical...
Christina Lu, Jack Gallagher, Jonathan Michala +2 more
Large language models can represent a variety of personas but typically default to a helpful Assistant identity cultivated during post-training. We...
Luoming Hu, Jingjie Zeng, Liang Yang +1 more
Enhancing the moral alignment of Large Language Models (LLMs) is a critical challenge in AI safety. Current alignment techniques often act as...
Feng Zhang, Shijia Li, Chunmao Zhang +7 more
User simulators serve as the critical interactive environment for agent post-training, and an ideal user simulator generalizes across domains and...
Renyang Liu, Kangjie Chen, Han Qiu +4 more
Image generation models (IGMs), while capable of producing impressive and creative content, often memorize a wide range of undesirable concepts from...
Abdelaziz Bounhar, Rania Hossam Elmohamady Elbadry, Hadi Abdine +3 more
Steering Large Language Models (LLMs) through activation interventions has emerged as a lightweight alternative to fine-tuning for alignment and...
Ruiqi Li, Zhiqiang Wang, Yunhao Yao +1 more
To standardize interactions between LLM-based agents and their environments, the Model Context Protocol (MCP) was proposed and has since been widely...
Chao Liu, Ngai-Man Cheung
3D Vision-Language Models (VLMs), such as PointLLM and GPT4Point, have shown strong reasoning and generalization abilities in 3D understanding tasks....
Zenghao Duan, Zhiyi Yin, Zhichao Shi +8 more
Large language models (LLMs) exhibit exceptional performance but pose inherent risks of generating toxic content, restricting their safe deployment....
Mizuki Sakai, Mizuki Yokoyama, Wakaba Tateishi +1 more
Large language models (LLMs) are increasingly used as autonomous agents in strategic and social interactions. Although recent studies suggest that...
Mohamed Nabeel, Oleksii Starov
According to Gartner, more than 70% of organizations will have integrated AI models into their workflows by the end of 2025. In order to reduce cost...
San Kim, Gary Geunbae Lee
Large Language Models (LLMs) have greatly advanced Natural Language Processing (NLP), particularly through instruction tuning, which enables broad...
Bocheng Chen, Xi Chen, Han Zi +5 more
Identifying specific moral errors in an input and generating appropriate corrections require moral sensitivity in large language models (LLMs), which...
Ahmed Ahmed, A. Feder Cooper, Sanmi Koyejo +1 more
Many unresolved legal questions over LLMs and copyright center on memorization: whether specific training data have been encoded in the model's...
Neusha Javidnia, Ruisi Zhang, Ashish Kundu +1 more
We present SWaRL, a robust and fidelity-preserving watermarking framework designed to protect the intellectual property of code LLM owners by...
Jiwei Guan, Haibo Jin, Haohan Wang
Recent advancements in Large Vision-Language Models (LVLMs) have shown groundbreaking capabilities across diverse multimodal tasks. However, these...
Davis Brown, Juan-Pablo Rivera, Dan Hendrycks +1 more
As frontier AIs become more powerful and costly to develop, adversaries have increasing incentives to steal model weights by mounting exfiltration...
Jiajie Zhu, Xia Du, Xiaoyuan Liu +4 more
The rapid advancements in artificial intelligence have significantly accelerated the adoption of speech recognition technology, leading to its...
Nandish Chattopadhyay, Abdul Basit, Amira Guesmi +3 more
Adversarial attacks pose a significant challenge to the reliable deployment of machine learning models in EdgeAI applications, such as autonomous...
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial