Benchmark LOW
Yao Tong, Haonan Wang, Siquan Li +2 more
Fingerprinting Large Language Models (LLMs) is essential for provenance verification and model attribution. Existing methods typically extract...
7 months ago cs.CR cs.AI cs.CL
PDF
Benchmark LOW
Joel Dyer, Daniel Jarne Ornia, Nicholas Bishop +2 more
Evaluating the safety of frontier AI systems is an increasingly important concern, helping to measure the capabilities of such models and identify...
7 months ago cs.LG cs.AI stat.ML
PDF
Benchmark LOW
Yixu Wang, Xin Wang, Yang Yao +4 more
The rapid integration of Large Language Models (LLMs) into high-stakes domains necessitates reliable safety and compliance evaluation. However,...
Benchmark HIGH
Simin Chen, Yixin He, Suman Jana +1 more
LLM-based agents are increasingly deployed for software maintenance tasks such as automated program repair (APR). APR agents automatically fetch...
Benchmark LOW
Ruolin Chen, Yinqian Sun, Jihang Wang +3 more
Embodied agents powered by large language models (LLMs) inherit advanced planning capabilities; however, their direct interaction with the physical...
Benchmark LOW
Xiang Zhang, Kun Wei, Xu Yang +3 more
As Large Language Models (LLMs) become increasingly prevalent, their security vulnerabilities have already drawn attention. Machine unlearning is...
7 months ago cs.LG cs.CL
PDF
Benchmark LOW
Jiacheng Shi, Hongfei Du, Y. Alicia Hong +1 more
Speech emotion recognition (SER) with audio-language models (ALMs) remains vulnerable to distribution shifts at test time, leading to performance...
7 months ago cs.SD cs.AI
PDF
Benchmark LOW
Aryan Yazdan Parast, Parsa Hosseini, Hesam Asadollahzadeh +4 more
Object hallucination in Multimodal Large Language Models (MLLMs) is a persistent failure mode that causes the model to perceive objects absent in the...
7 months ago cs.CV cs.AI cs.LG
PDF
Benchmark LOW
Adrian Arnaiz-Rodriguez, Miguel Baidal, Erik Derner +5 more
Large language model-powered chatbots have transformed how people seek information, especially in high-stakes contexts like mental health. Despite...
7 months ago cs.CL cs.CY
PDF
Benchmark MEDIUM
Su Kara, Fazle Faisal, Suman Nath
Recent advances in browser-based LLM agents have shown promise for automating tasks ranging from simple form filling to hotel booking or online...
7 months ago cs.AI cs.CR cs.LG
PDF
Benchmark MEDIUM
Yihan Wu, Ruibo Chen, Georgios Milis +1 more
As large language models become increasingly capable and widely deployed, verifying the provenance of machine-generated content is critical to...
Benchmark HIGH
Alireza Lotfi, Charalampos Katsis, Elisa Bertino
Software vulnerabilities remain a critical security challenge, providing entry points for attackers into enterprise networks. Despite advances in...
Benchmark MEDIUM
Meet Udeshi, Venkata Sai Charan Putrevu, Prashanth Krishnamurthy +4 more
Security of software supply chains is necessary to ensure that software updates do not contain maliciously injected code or introduce vulnerabilities...
Benchmark MEDIUM
Shuyi Lin, Tian Lu, Zikai Wang +3 more
OpenAI's GPT-OSS family provides open-weight language models with explicit chain-of-thought (CoT) reasoning and a Harmony prompt format. We summarize...
7 months ago cs.AI cs.CR
PDF
Benchmark LOW
Nayeong Kim, Seong Joon Oh, Suha Kwak
Parameter-efficient fine-tuning (PEFT) of vision-language models (VLMs) excels in various vision tasks thanks to the rich knowledge and...
7 months ago cs.CV cs.AI
PDF
Benchmark HIGH
Jianshuo Dong, Sheng Guo, Hao Wang +6 more
Search agents connect LLMs to the Internet, enabling them to access broader and more up-to-date information. However, this also introduces a new...
7 months ago cs.AI cs.CL cs.CR
PDF
Benchmark MEDIUM
Sihan Hu, Xiansheng Cai, Yuan Huang +5 more
Training large language models with Reinforcement Learning with Verifiable Rewards (RLVR) exhibits a set of distinctive and puzzling behaviors that...
7 months ago cs.AI cond-mat.dis-nn cond-mat.stat-mech
PDF
Benchmark MEDIUM
Sherif Saad, Kevin Shi, Mohammed Mamun +1 more
Automated machine learning (AutoML) has emerged as a promising paradigm for automating machine learning (ML) pipeline design, broadening AI adoption....
Benchmark MEDIUM
Xiaotian Zou
Multimodal Large Language Models (MLLMs) have transformed text-to-image workflows, allowing designers to create novel visual concepts with...
7 months ago cs.CV cs.AI
PDF
Benchmark MEDIUM
Antreas Ioannou, Andreas Shiamishis, Nora Hollenstein +1 more
In an era dominated by Large Language Models (LLMs), understanding their capabilities and limitations, especially in high-stakes fields like law, is...
7 months ago cs.CL cs.AI cs.LG
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial