Benchmark MEDIUM
Zhanguang Zhang, Zhiyuan Li, Behnam Rahmati +10 more
Robot action planning in the real world is challenging as it requires not only understanding the current state of the environment but also predicting...
Benchmark MEDIUM
Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera
Large language models are becoming pervasive core components in many real-world applications. As a consequence, security alignment represents a...
2 days ago cs.CR cs.AI cs.CL
PDF
Benchmark LOW
Mohammad Asadi, Jack W. O'Sullivan, Fang Cao +5 more
Multimodal AI systems have achieved remarkable performance across a broad range of real-world tasks, yet the mechanisms underlying visual-language...
Benchmark LOW
Zhongyi Li, Wan Tian, Jingyu Chen +8 more
Multi-agent collaboration has emerged as a powerful paradigm for enhancing the reasoning capabilities of large language models, yet it suffers from...
Benchmark LOW
Zongjie Li, Chaozheng Wang, Yuchong Xie +2 more
Large Language Models are increasingly being considered for deployment in safety-critical military applications. However, current benchmarks suffer...
2 days ago cs.CY cs.AI
PDF
Benchmark LOW
Zihan Guo, Zhiyu Chen, Xiaohang Nie +3 more
With the rapid evolution of Large Language Model (LLM) agent ecosystems, centralized skill marketplaces have emerged as pivotal infrastructure for...
3 days ago cs.CR cs.SE
PDF
Benchmark LOW
Yandan Zheng, Haoran Luo, Zhenghong Lin +2 more
Benchmarks are the de facto standard for tracking progress in large language models (LLMs), yet static test sets can rapidly saturate, become...
Benchmark HIGH
Sen Fang, Weiyuan Ding, Zhezhen Cao +2 more
Large Language Models (LLMs) are increasingly adopted for vulnerability detection, yet their reasoning remains fundamentally unsound. We identify a...
4 days ago cs.SE cs.AI cs.CR
PDF
Benchmark MEDIUM
Jiahao Chen, Zhiming Zhao, Yuwen Pu +4 more
Federated learning (FL) has attracted substantial attention in both academia and industry, yet its practical security posture remains poorly...
Benchmark MEDIUM
Hung Yun Tseng, Wuzhen Li, Blerina Gkotse +1 more
The potential of Large Language Models (LLMs) to provide harmful information remains a significant concern due to the vast breadth of illegal queries...
Benchmark MEDIUM
Christopher J. Agostino, Quan Le Thien, Nayan D'Souza +1 more
Understanding the fundamental mechanisms governing the production of meaning in the processing of natural language is critical for designing safe,...
4 days ago cs.CL cs.AI cs.HC
PDF
Benchmark MEDIUM
Fazhong Liu, Zhuoyan Chen, Tu Lan +6 more
Autonomous coding agents are increasingly integrated into software development workflows, offering capabilities that extend beyond code suggestion to...
5 days ago cs.CR cs.AI
PDF
Benchmark LOW
Dong Yan, Jian Liang, Yanbo Wang +3 more
Test-Time Reinforcement Learning (TTRL) enables Large Language Models (LLMs) to enhance reasoning capabilities on unlabeled test streams by deriving...
5 days ago cs.LG cs.AI
PDF
Benchmark LOW
Zou Qiang
Large language models (LLMs) demonstrate strong generative capabilities but remain vulnerable to hallucination and unreliable reasoning under...
5 days ago cs.AI cs.CL
PDF
Benchmark MEDIUM
Zikang Ding, Junhao Li, Suling Wu +3 more
Model watermarking utilizes internal representations to protect the ownership of large language models (LLMs). However, these features inevitably...
6 days ago cs.CR cs.AI
PDF
Benchmark LOW
Alvin Rajkomar, Pavan Sudarshan, Angela Lai +1 more
Background: Clinical trials rely on transparent inclusion criteria to ensure generalizability. In contrast, benchmarks validating health-related...
Benchmark HIGH
Iakovos-Christos Zarkadis, Christos Douligeris
Supervised detection of network attacks has always been a critical part of network intrusion detection systems (NIDS). Nowadays, in a pivotal time...
1 weeks ago cs.CR cs.AI stat.AP
PDF
Benchmark MEDIUM
Haocheng Li, Juepeng Zheng, Shuangxi Miao +4 more
Multimodal remote sensing semantic segmentation enhances scene interpretation by exploiting complementary physical cues from heterogeneous data....
Benchmark MEDIUM
Wanjun Du, Zifeng Yuan, Tingting Chen +3 more
Existing vision-language models (VLMs) have demonstrated impressive performance in reasoning-based segmentation. However, current benchmarks are...
1 weeks ago cs.CV cs.AI
PDF
Benchmark MEDIUM
Yuntong Zhang, Sungmin Kang, Ruijie Meng +2 more
Agentic AI has been a topic of great interest recently. A Large Language Model (LLM) agent involves one or more LLMs in the back-end. In the front...
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial