Benchmark LOW
Sara AlMahri, Liming Xu, Alexandra Brintrup
Modern supply chains are increasingly exposed to disruptions from geopolitical events, demand shocks, trade restrictions, to natural disasters. While...
Benchmark MEDIUM
Greta Dolcetti, Giulio Zizzo, Sergio Maffeis
We present an experimental evaluation that assesses the robustness of four open source LLMs claiming function-calling capabilities against three...
4 months ago cs.CR cs.AI
PDF
Benchmark HIGH
Shaznin Sultana, Sadia Afreen, Nasir U. Eisty
Context: Traditional software security analysis methods struggle to keep pace with the scale and complexity of modern codebases, requiring...
Benchmark MEDIUM
Ziqi Ding, Yunfeng Wan, Wei Song +7 more
CAPTCHAs are widely used by websites to block bots and spam by presenting challenges that are easy for humans but difficult for automated programs to...
4 months ago cs.SD cs.CY eess.AS
PDF
Benchmark MEDIUM
Seong-Gyu Park, Sohee Park, Jisu Lee +2 more
Recent LLMs increasingly integrate reasoning mechanisms like Chain-of-Thought (CoT). However, this explicit reasoning exposes a new attack surface...
4 months ago cs.CL cs.CR cs.LG
PDF
Benchmark MEDIUM
Erin Feiglin, Nir Hutnik, Raz Lapid
We investigate a failure mode of large language models (LLMs) in which plain-text prompts elicit excessive outputs, a phenomenon we term Overflow....
4 months ago cs.CL cs.AI
PDF
Benchmark MEDIUM
Dongryeol Lee, Yerin Hwang, Taegwan Kang +3 more
While large language models (LLMs) are increasingly used as automatic judges for question answering (QA) and other reference-conditioned evaluation...
Benchmark LOW
Huipeng Ma, Luan Zhang, Dandan Song +10 more
In multi-hop reasoning, multi-round retrieval-augmented generation (RAG) methods typically rely on LLM-generated content as the retrieval query....
Benchmark MEDIUM
Weipeng Jiang, Xiaoyu Zhang, Juan Zhai +3 more
Emoticons are widely used in digital communication to convey affective intent, yet their safety implications for Large Language Models (LLMs) remain...
4 months ago cs.CR cs.AI cs.SE
PDF
Benchmark LOW
Andrew D. Maynard
Large language model (LLM)-based conversational AI systems present a challenge to human cognition that current frameworks for understanding...
4 months ago cs.HC cs.AI cs.CY
PDF
Benchmark MEDIUM
Ying Zhou, Jiacheng Wei, Yu Qi +2 more
Large language models (LLMs) demonstrate remarkable capabilities in natural language understanding and generation. Despite being trained on...
4 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Vasanth Iyer, Leonardo Bobadilla, S. S. Iyengar
Large Language Models (LLMs) such as Gemma-2B have shown strong performance in various natural language processing tasks. However, general-purpose...
Benchmark MEDIUM
Qiang Zhang, Elena Emma Wang, Jiaming Li +1 more
This study presents a Secure Multi-Tenant Architecture (SMTA) combined with a novel concept Burn-After-Use (BAU) mechanism for enterprise LLM...
4 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Minfeng Qi, Dongyang He, Qin Wang +1 more
Visual Reasoning CAPTCHAs (VRCs) combine visual scenes with natural-language queries that demand compositional inference over objects, attributes,...
4 months ago cs.CR cs.CV cs.ET
PDF
Benchmark MEDIUM
Keyang Zhang, Zeyu Chen, Xuan Feng +4 more
The security of scripting languages such as PowerShell is critical given their powerful automation and administration capabilities, often exercised...
4 months ago cs.CR cs.PL
PDF
Benchmark MEDIUM
Hoang-Chau Luong, Lingwei Chen
Low-Rank Adaptation (LoRA) is widely used for parameter-efficient fine-tuning of large language models, but it is notably ineffective at removing...
Benchmark MEDIUM
Tianshi Li
On December 4, 2025, Anthropic released Anthropic Interviewer, an AI tool for running qualitative interviews at scale, along with a public dataset of...
4 months ago cs.CR cs.AI cs.CY
PDF
Benchmark MEDIUM
Zhi Yang, Runguo Li, Qiqi Qiang +15 more
Financial agents powered by large language models (LLMs) are increasingly deployed for investment analysis, risk assessment, and automated...
4 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Suyash Mishra, Qiang Li, Srikanth Patil +1 more
Vision Language Models (VLMs) are poised to revolutionize the digital transformation of pharmacyceutical industry by enabling intelligent, scalable,...
4 months ago cs.CV cs.LG
PDF
Benchmark MEDIUM
Konstantinos E. Kampourakis, Vyron Kampourakis, Efstratios Chatzoglou +2 more
Realistic, large-scale, and well-labeled cybersecurity datasets are essential for training and evaluating Intrusion Detection Systems (IDS). However,...
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial