SecCodePRM: A Process Reward Model for Code Security
Weichen Yu, Ravi Mangal, Yinyi Luo +4 more
Large Language Models are rapidly becoming core components of modern software development workflows, yet ensuring code security remains challenging....
2,077+ academic papers on AI security, attacks, and defenses
Showing 481–500 of 2,077 papers
Weichen Yu, Ravi Mangal, Yinyi Luo +4 more
Large Language Models are rapidly becoming core components of modern software development workflows, yet ensuring code security remains challenging....
Tri Nguyen, Huy Hoang Bao Le, Lohith Srikanth Pentapalli +2 more
Detecting jailbreak attempts in clinical training large language models (LLMs) requires accurate modeling of linguistic deviations that signal unsafe...
Adriana Alvarado Garcia, Ruyuan Wan, Ozioma C. Oguine +1 more
Recently, red teaming, with roots in security, has become a key evaluative approach to ensure the safety and reliability of Generative Artificial...
George Tsigkourakos, Constantinos Patsakis
Static Application Security Testing (SAST) tools are integral to modern DevSecOps pipelines, yet tools like CodeQL, Semgrep, and SonarQube remain...
Jayesh Choudhari, Piyush Kumar Singh
Domain fine-tuning is a common path to deploy small instruction-tuned language models as customer-support assistants, yet its effects on...
Hayfa Dhabhi, Kashyap Thimmaraju
Large Language Models (LLMs) deploy safety mechanisms to prevent harmful outputs, yet these defenses remain vulnerable to adversarial prompts. While...
Kun Wang, Zherui Li, Zhenhong Zhou +8 more
Omni-modal Large Language Models (OLLMs) greatly expand LLMs' multimodal capabilities but also introduce cross-modal safety risks. However, a...
Zhenyu Xu, Victor S. Sheng
Protecting the intellectual property of large language models (LLMs) is a critical challenge due to the proliferation of unauthorized derivative...
Herman Errico
As artificial intelligence systems evolve from passive assistants into autonomous agents capable of executing consequential actions, the security...
Pei-Chi Pan, Yingbin Liang, Sen Lin
Large Language Models (LLMs) demonstrate transformative potential, yet their reasoning remains inconsistent and unreliable. Reinforcement learning...
Chaeyun Kim, YongTaek Lim, Kihyun Kim +2 more
Existing red-teaming benchmarks, when adapted to new languages via direct translation, fail to capture socio-technical vulnerabilities rooted in...
Georgios Syros, Evan Rose, Brian Grinstead +4 more
Large language model (LLM) based web agents are increasingly deployed to automate complex online tasks by directly interacting with web sites and...
Kotekar Annapoorna Prabhu, Andrew Gan, Zahra Ghodsi
Machine learning relies on randomness as a fundamental component in various steps such as data sampling, data augmentation, weight initialization,...
Ashwin Sreevatsa, Sebastian Prasanna, Cody Rushing
The AI Control research agenda aims to develop control protocols: safety techniques that prevent untrusted AI systems from taking harmful actions...
Yuting Ning, Jaylen Jones, Zhehao Zhang +5 more
Computer-use agents (CUAs) have made tremendous progress in the past year, yet they still frequently produce misaligned actions that deviate from the...
Yu Yan, Sheng Sun, Shengjia Cheng +3 more
Vision-Language Models (VLMs) with multimodal reasoning capabilities are high-value attack targets, given their potential for handling complex...
Suraj Ranganath, Atharv Ramesh
AI-text detectors face a critical robustness challenge: adversarial paraphrasing attacks that preserve semantics while evading detection. We...
Suraj Ranganath, Atharv Ramesh
AI-text detectors face a critical robustness challenge: adversarial paraphrasing attacks that preserve semantics while evading detection. We...
Oliver Daniels, Perusha Moodley, Benjamin M. Marlin +1 more
Alignment audits aim to robustly identify hidden goals from strategic, situationally aware misaligned models. Despite this threat model, existing...
Yu Fu, Haz Sameen Shahgir, Huanli Gong +3 more
Large language models (LLMs) increasingly combine long-context processing with advanced reasoning, enabling them to retrieve and synthesize...
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial