Benchmark HIGH
Xiaojun Jia, Jie Liao, Qi Guo +11 more
Recent advances in multi-modal large language models (MLLMs) have enabled unified perception-reasoning capabilities, yet these systems remain highly...
5 months ago cs.CR cs.CV
PDF
Benchmark HIGH
Caleb Gross
Security research is fundamentally a problem of resource constraint and consequent prioritization. There is simply too much attack surface and too...
5 months ago cs.CR cs.IR
PDF
Benchmark HIGH
Xiuyuan Chen, Jian Zhao, Yuxiang He +10 more
While the deployment of large language models (LLMs) in high-value industries continues to expand, the systematic assessment of their safety against...
Benchmark HIGH
Songwen Zhao, Danqing Wang, Kexun Zhang +3 more
Vibe coding is a new programming paradigm in which human engineers instruct large language model (LLM) agents to complete complex coding tasks with...
5 months ago cs.SE cs.CL
PDF
Benchmark HIGH
Jiawei Chen, Yang Yang, Chao Yu +6 more
Large Reasoning Models (LRMs) have emerged as a powerful advancement in multi-step reasoning tasks, offering enhanced transparency and logical...
5 months ago cs.CR cs.AI
PDF
Benchmark HIGH
Juncheng Li, Yige Li, Hanxun Huang +5 more
Backdoor attacks undermine the reliability and trustworthiness of machine learning systems by injecting hidden behaviors that can be maliciously...
Benchmark HIGH
Zhijie Chen, Xiang Chen, Ziming Li +2 more
Context: Software Vulnerability Assessment (SVA) plays a vital role in evaluating and ranking vulnerabilities in software systems to ensure their...
Benchmark HIGH
Chunyang Li, Zifeng Kang, Junwei Zhang +4 more
The adoption of Vision-Language Models (VLMs) in embodied AI agents, while being effective, brings safety concerns such as jailbreaking. Prior work...
5 months ago cs.CR cs.CY cs.RO
PDF
Benchmark HIGH
Henry Wong, Clement Fung, Weiran Lin +3 more
To autonomously control vehicles, driving agents use outputs from a combination of machine-learning (ML) models, controller logic, and custom...
5 months ago cs.CR cs.CV cs.LG
PDF
Benchmark HIGH
Jiayu Li, Yunhan Zhao, Xiang Zheng +4 more
Vision-Language-Action (VLA) models enable robots to interpret natural-language instructions and perform diverse tasks, yet their integration of...
5 months ago cs.CR cs.AI cs.CV
PDF
Benchmark HIGH
Zhishen Sun, Guang Dai, Haishan Ye
LLMs demonstrate performance comparable to human abilities in complex tasks such as mathematical reasoning, but their robustness in mathematical...
Benchmark HIGH
Kaiwen Zhou, Ahmed Elgohary, A S M Iftekhar +1 more
The ability of LLM agents to plan and invoke tools exposes them to new safety risks, making a comprehensive red-teaming system crucial for...
6 months ago cs.CR cs.AI cs.CL
PDF
Benchmark HIGH
Euodia Dodd, Nataša Krčo, Igor Shilov +1 more
Membership inference attacks (MIAs) have emerged as the standard tool for evaluating the privacy risks of AI models. However, state-of-the-art...
6 months ago cs.LG cs.CR
PDF
Benchmark HIGH
Osama Al Haddad, Muhammad Ikram, Ejaz Ahmed +1 more
Security analysts face increasing pressure to triage large and complex vulnerability backlogs. Large Language Models (LLMs) offer a potential aid by...
Benchmark HIGH
Pranshav Gajjar, Molham Khoja, Abiodun Ganiyu +4 more
The impending adoption of Open Radio Access Network (O-RAN) is fueling innovation in the RAN towards data-driven operation. Unlike traditional RAN...
6 months ago cs.CR cs.NI
PDF
Benchmark HIGH
Chengquan Guo, Yuzhou Nie, Chulin Xie +3 more
As large language models (LLMs) are increasingly used for code generation, concerns over the security risks have grown substantially. Early research...
Benchmark HIGH
Bin Liu, Yanjie Zhao, Guoai Xu +1 more
Large language model (LLM) agents have demonstrated remarkable capabilities in software engineering and cybersecurity tasks, including code...
6 months ago cs.SE cs.CR
PDF
Benchmark HIGH
Trilok Padhi, Pinxian Lu, Abdulkadir Erol +5 more
Large Language Model (LLM) agents are powering a growing share of interactive web applications, yet remain vulnerable to misuse and harm. Prior...
Benchmark HIGH
Ivan Dubrovsky, Anastasia Orlova, Illarion Iov +3 more
Benchmarking outcomes increasingly govern trust, selection, and deployment of LLMs, yet these evaluations remain vulnerable to semantically...
Benchmark HIGH
Dongsen Zhang, Zekun Li, Xu Luo +3 more
The Model Context Protocol (MCP) standardizes how large language model (LLM) agents discover, describe, and call external tools. While MCP unlocks...
7 months ago cs.CR cs.AI
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial