Tool MEDIUM
Herman Errico
As artificial intelligence systems evolve from passive assistants into autonomous agents capable of executing consequential actions, the security...
3 months ago cs.CR cs.AI
PDF
Benchmark LOW
Pei-Chi Pan, Yingbin Liang, Sen Lin
Large Language Models (LLMs) demonstrate transformative potential, yet their reasoning remains inconsistent and unreliable. Reinforcement learning...
Benchmark HIGH
Chaeyun Kim, YongTaek Lim, Kihyun Kim +2 more
Existing red-teaming benchmarks, when adapted to new languages via direct translation, fail to capture socio-technical vulnerabilities rooted in...
3 months ago cs.CY cs.AI
PDF
Attack HIGH
Georgios Syros, Evan Rose, Brian Grinstead +4 more
Large language model (LLM) based web agents are increasingly deployed to automate complex online tasks by directly interacting with web sites and...
3 months ago cs.CR cs.AI
PDF
Attack HIGH
Kotekar Annapoorna Prabhu, Andrew Gan, Zahra Ghodsi
Machine learning relies on randomness as a fundamental component in various steps such as data sampling, data augmentation, weight initialization,...
3 months ago cs.CR cs.LG
PDF
Benchmark LOW
Ashwin Sreevatsa, Sebastian Prasanna, Cody Rushing
The AI Control research agenda aims to develop control protocols: safety techniques that prevent untrusted AI systems from taking harmful actions...
3 months ago cs.CR cs.LG cs.SE
PDF
Benchmark MEDIUM
Yuting Ning, Jaylen Jones, Zhehao Zhang +5 more
Computer-use agents (CUAs) have made tremendous progress in the past year, yet they still frequently produce misaligned actions that deviate from the...
Attack HIGH
Yu Yan, Sheng Sun, Shengjia Cheng +3 more
Vision-Language Models (VLMs) with multimodal reasoning capabilities are high-value attack targets, given their potential for handling complex...
3 months ago cs.CR cs.AI
PDF
Attack HIGH
Suraj Ranganath, Atharv Ramesh
AI-text detectors face a critical robustness challenge: adversarial paraphrasing attacks that preserve semantics while evading detection. We...
3 months ago cs.LG cs.AI cs.CR
PDF
Attack HIGH
Suraj Ranganath, Atharv Ramesh
AI-text detectors face a critical robustness challenge: adversarial paraphrasing attacks that preserve semantics while evading detection. We...
3 months ago cs.LG cs.AI cs.CR
PDF
Defense MEDIUM
Oliver Daniels, Perusha Moodley, Benjamin M. Marlin +1 more
Alignment audits aim to robustly identify hidden goals from strategic, situationally aware misaligned models. Despite this threat model, existing...
Defense MEDIUM
Yu Fu, Haz Sameen Shahgir, Huanli Gong +3 more
Large language models (LLMs) increasingly combine long-context processing with advanced reasoning, enabling them to retrieve and synthesize...
3 months ago cs.CL cs.CR
PDF
Attack HIGH
Jona te Lintelo, Lichao Wu, Stjepan Picek
The rapid adoption of Mixture-of-Experts (MoE) architectures marks a major shift in the deployment of Large Language Models (LLMs). MoE LLMs improve...
Survey LOW
Shae McFadden, Myles Foley, Elizabeth Bates +5 more
Deep Reinforcement Learning (DRL) has achieved remarkable success in domains requiring sequential decision-making, motivating its application to...
3 months ago cs.LG cs.CR
PDF
Attack HIGH
Yanzhang Fu, Zizheng Guo, Jizhou Luo
Score-based query attacks pose a serious threat to deep learning models by crafting adversarial examples (AEs) using only black-box access to model...
3 months ago cs.LG cs.CR
PDF
Attack HIGH
Scott Thornton
Hybrid Retrieval-Augmented Generation (RAG) pipelines combine vector similarity search with knowledge graph expansion for multi-hop reasoning. We...
3 months ago cs.CR cs.IR cs.LG
PDF
Defense MEDIUM
Yukun Jiang, Hai Huang, Mingjie Li +3 more
By introducing routers to selectively activate experts in Transformer layers, the mixture-of-experts (MoE) architecture significantly reduces...
3 months ago cs.LG cs.AI cs.CR
PDF
Benchmark LOW
Ahmed Salem, Andrew Paverd, Sahar Abdelnabi
Large language models (LLMs) are commonly treated as stateless: once an interaction ends, no information is assumed to persist unless it is...
3 months ago cs.LG cs.AI cs.CR
PDF
Benchmark MEDIUM
Igor Santos-Grueiro
Safety evaluation for advanced AI systems assumes that behavior observed under evaluation predicts behavior in deployment. This assumption weakens...
3 months ago cs.AI cs.CR cs.LG
PDF
Benchmark MEDIUM
Pouria Arefijamal, Mahdi Ahmadlou, Bardia Safaei +1 more
Federated learning (FL) is a decentralized learning paradigm widely adopted in resource-constrained Internet of Things (IoT) environments. These...
3 months ago cs.LG cs.CR cs.DC
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial