Benchmark LOW
O. Clerc, R. Abdelghani, C. Desvaux +3 more
The rapid adoption of generative artificial intelligence (GenAI) in schools raises concerns about students' uncritical reliance on its outputs....
Benchmark MEDIUM
Yiheng Huang, Zhijia Zhao, Bihuan Chen +5 more
The model context protocol (MCP) standardizes how LLMs connect to external tools and data sources, enabling faster integration but introducing new...
1 months ago cs.CR cs.SE
PDF
Benchmark LOW
Yukai Ma, Honglin He, Selina Song +2 more
Long-horizon navigation in complex urban environments relies heavily on continuous human operation, which leads to fatigue, reduced efficiency, and...
Benchmark MEDIUM
Weidi Luo, Xiaofei Wen, Tenghao Huang +5 more
Large language models (LLMs) are increasingly deployed for everyday tasks, including food preparation and health-related guidance. However, food...
Benchmark MEDIUM
Kıvanç Kuzey Dikici, Serdar Kara, Semih Çağlar +2 more
As Large Language Models (LLMs) for code increasingly utilize massive, often non-permissively licensed datasets, evaluating data contamination...
1 months ago cs.SE cs.CR
PDF
Benchmark LOW
Yao Qin, Yangyang Yan, Jinhua Pang +1 more
The integration of Large Language Models (LLMs) into life sciences has catalyzed the development of "AI Scientists." However, translating these...
Benchmark MEDIUM
Yanting Wang, Jinyuan Jia
Random subspace method has wide security applications such as providing certified defenses against adversarial and backdoor attacks, and building...
Benchmark MEDIUM
Yubo Li, Lu Zhang, Tianchong Jiang +2 more
Large language models systematically fail when a salient surface cue conflicts with an unstated feasibility constraint. We study this through a...
1 months ago cs.CL cs.AI
PDF
Benchmark MEDIUM
Yicheng Cai, Mitchell John DeStefano, Guodong Dong +5 more
As Large Language Models (LLMs) and multi-agent AI systems are demonstrating increasing potential in cybersecurity operations, organizations,...
1 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Quan Zhang, Lianhang Fu, Lvsi Lian +5 more
Equipping LLM agents with real-world tools can substantially improve productivity. However, granting agents autonomy over tool use also transfers the...
1 months ago cs.CR cs.AI
PDF
Benchmark LOW
Kesheng Chen, Yamin Hu, Qi Zhou +2 more
Vision-language models (VLMs) achieve strong performance on many benchmarks, yet a basic reliability question remains underexplored: when visual...
1 months ago cs.CV cs.AI cs.CL
PDF
Benchmark MEDIUM
Vishal Narnaware, Animesh Gupta, Kevin Zhai +2 more
Multimodal Diffusion Large Language Models (MDLLMs) achieve high-concurrency generation through parallel masked decoding, yet the architectures...
Benchmark MEDIUM
Pei Chen, Geng Hong, Xinyi Wu +6 more
The emergence of Large Language Model-enhanced Search Engines (LLMSEs) has revolutionized information retrieval by integrating web-scale search...
1 months ago cs.CR cs.IR
PDF
Benchmark LOW
Zhihui Yao, Hengran Zhang, Keping Bi
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) with external knowledge but remains vulnerable to low-authority sources...
Benchmark LOW
Francesco Gentile, Nicola Dall'Asen, Francesco Tonini +3 more
As vision-language models are deployed at scale, understanding their internal mechanisms becomes increasingly critical. Existing interpretability...
Benchmark MEDIUM
Michael Somma, Markus Großpointner, Paul Zabalegui +2 more
The increasing complexity and interconnectivity of digital infrastructures make scalable and reliable security assessment methods essential. Robotic...
1 months ago cs.RO cs.AI
PDF
Benchmark MEDIUM
Oussama Draissi, Mark Günzel, Ahmad-Reza Sadeghi +1 more
WebAssembly's (Wasm) monolithic linear memory model facilitates memory corruption attacks that can escalate to cross-site scripting in browsers or go...
1 months ago cs.CR cs.LG
PDF
Benchmark MEDIUM
Zhanguang Zhang, Zhiyuan Li, Behnam Rahmati +10 more
Robot action planning in the real world is challenging as it requires not only understanding the current state of the environment but also predicting...
Benchmark MEDIUM
Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera
Large language models are becoming pervasive core components in many real-world applications. As a consequence, security alignment represents a...
1 months ago cs.CR cs.AI cs.CL
PDF
Benchmark LOW
Mohammad Asadi, Jack W. O'Sullivan, Fang Cao +5 more
Multimodal AI systems have achieved remarkable performance across a broad range of real-world tasks, yet the mechanisms underlying visual-language...
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial