Benchmark MEDIUM
Quan Zhang, Lianhang Fu, Lvsi Lian +5 more
Equipping LLM agents with real-world tools can substantially improve productivity. However, granting agents autonomy over tool use also transfers the...
1 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Vishal Narnaware, Animesh Gupta, Kevin Zhai +2 more
Multimodal Diffusion Large Language Models (MDLLMs) achieve high-concurrency generation through parallel masked decoding, yet the architectures...
Benchmark MEDIUM
Pei Chen, Geng Hong, Xinyi Wu +6 more
The emergence of Large Language Model-enhanced Search Engines (LLMSEs) has revolutionized information retrieval by integrating web-scale search...
1 months ago cs.CR cs.IR
PDF
Benchmark MEDIUM
Michael Somma, Markus Großpointner, Paul Zabalegui +2 more
The increasing complexity and interconnectivity of digital infrastructures make scalable and reliable security assessment methods essential. Robotic...
1 months ago cs.RO cs.AI
PDF
Benchmark MEDIUM
Oussama Draissi, Mark Günzel, Ahmad-Reza Sadeghi +1 more
WebAssembly's (Wasm) monolithic linear memory model facilitates memory corruption attacks that can escalate to cross-site scripting in browsers or go...
1 months ago cs.CR cs.LG
PDF
Benchmark MEDIUM
Zhanguang Zhang, Zhiyuan Li, Behnam Rahmati +10 more
Robot action planning in the real world is challenging as it requires not only understanding the current state of the environment but also predicting...
Benchmark MEDIUM
Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera
Large language models are becoming pervasive core components in many real-world applications. As a consequence, security alignment represents a...
1 months ago cs.CR cs.AI cs.CL
PDF
Benchmark MEDIUM
Jiahao Chen, Zhiming Zhao, Yuwen Pu +4 more
Federated learning (FL) has attracted substantial attention in both academia and industry, yet its practical security posture remains poorly...
Benchmark MEDIUM
Hung Yun Tseng, Wuzhen Li, Blerina Gkotse +1 more
The potential of Large Language Models (LLMs) to provide harmful information remains a significant concern due to the vast breadth of illegal queries...
Benchmark MEDIUM
Christopher J. Agostino, Quan Le Thien, Nayan D'Souza +1 more
Understanding the fundamental mechanisms governing the production of meaning in the processing of natural language is critical for designing safe,...
1 months ago cs.CL cs.AI cs.HC
PDF
Benchmark MEDIUM
Fazhong Liu, Zhuoyan Chen, Tu Lan +6 more
Autonomous coding agents are increasingly integrated into software development workflows, offering capabilities that extend beyond code suggestion to...
1 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Zikang Ding, Junhao Li, Suling Wu +3 more
Model watermarking utilizes internal representations to protect the ownership of large language models (LLMs). However, these features inevitably...
1 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Haocheng Li, Juepeng Zheng, Shuangxi Miao +4 more
Multimodal remote sensing semantic segmentation enhances scene interpretation by exploiting complementary physical cues from heterogeneous data....
Benchmark MEDIUM
Wanjun Du, Zifeng Yuan, Tingting Chen +3 more
Existing vision-language models (VLMs) have demonstrated impressive performance in reasoning-based segmentation. However, current benchmarks are...
1 months ago cs.CV cs.AI
PDF
Benchmark MEDIUM
Yuntong Zhang, Sungmin Kang, Ruijie Meng +2 more
Agentic AI has been a topic of great interest recently. A Large Language Model (LLM) agent involves one or more LLMs in the back-end. In the front...
Benchmark MEDIUM
Caglar Yildirim
Large language models (LLMs) are increasingly deployed as tool-using agents, shifting safety concerns from harmful text generation to harmful task...
Benchmark MEDIUM
Gengxin Sun, Ruihao Yu, Liangyi Yin +3 more
Ensuring robust and fair interview assessment remains a key challenge in AI-driven evaluation. This paper presents CoMAI, a general-purpose...
1 months ago cs.MA cs.AI
PDF
Benchmark MEDIUM
Yu Pan, Wenlong Yu, Tiejun Wu +4 more
Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks. However, they remain highly susceptible to...
1 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Ye Wang, Jing Liu, Toshiaki Koike-Akino
The safety and reliability of vision-language models (VLMs) are a crucial part of deploying trustworthy agentic AI systems. However, VLMs remain...
1 months ago cs.LG cs.AI cs.CL
PDF
Benchmark MEDIUM
Yuhuan Liu, Haitian Zhong, Xinyuan Xia +3 more
Large Language Models (LLMs) often suffer from catastrophic forgetting and collapse during sequential knowledge editing. This vulnerability stems...
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial