Benchmark MEDIUM
Seong-Gyu Park, Sohee Park, Jisu Lee +2 more
Recent LLMs increasingly integrate reasoning mechanisms like Chain-of-Thought (CoT). However, this explicit reasoning exposes a new attack surface...
3 months ago cs.CL cs.CR cs.LG
PDF
Benchmark MEDIUM
Erin Feiglin, Nir Hutnik, Raz Lapid
We investigate a failure mode of large language models (LLMs) in which plain-text prompts elicit excessive outputs, a phenomenon we term Overflow....
3 months ago cs.CL cs.AI
PDF
Benchmark MEDIUM
Dongryeol Lee, Yerin Hwang, Taegwan Kang +3 more
While large language models (LLMs) are increasingly used as automatic judges for question answering (QA) and other reference-conditioned evaluation...
Benchmark LOW
Huipeng Ma, Luan Zhang, Dandan Song +10 more
In multi-hop reasoning, multi-round retrieval-augmented generation (RAG) methods typically rely on LLM-generated content as the retrieval query....
Benchmark MEDIUM
Weipeng Jiang, Xiaoyu Zhang, Juan Zhai +3 more
Emoticons are widely used in digital communication to convey affective intent, yet their safety implications for Large Language Models (LLMs) remain...
4 months ago cs.CR cs.AI cs.SE
PDF
Benchmark LOW
Andrew D. Maynard
Large language model (LLM)-based conversational AI systems present a challenge to human cognition that current frameworks for understanding...
4 months ago cs.HC cs.AI cs.CY
PDF
Benchmark MEDIUM
Ying Zhou, Jiacheng Wei, Yu Qi +2 more
Large language models (LLMs) demonstrate remarkable capabilities in natural language understanding and generation. Despite being trained on...
4 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Vasanth Iyer, Leonardo Bobadilla, S. S. Iyengar
Large Language Models (LLMs) such as Gemma-2B have shown strong performance in various natural language processing tasks. However, general-purpose...
Benchmark MEDIUM
Qiang Zhang, Elena Emma Wang, Jiaming Li +1 more
This study presents a Secure Multi-Tenant Architecture (SMTA) combined with a novel concept Burn-After-Use (BAU) mechanism for enterprise LLM...
4 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Minfeng Qi, Dongyang He, Qin Wang +1 more
Visual Reasoning CAPTCHAs (VRCs) combine visual scenes with natural-language queries that demand compositional inference over objects, attributes,...
4 months ago cs.CR cs.CV cs.ET
PDF
Benchmark MEDIUM
Keyang Zhang, Zeyu Chen, Xuan Feng +4 more
The security of scripting languages such as PowerShell is critical given their powerful automation and administration capabilities, often exercised...
4 months ago cs.CR cs.PL
PDF
Benchmark MEDIUM
Hoang-Chau Luong, Lingwei Chen
Low-Rank Adaptation (LoRA) is widely used for parameter-efficient fine-tuning of large language models, but it is notably ineffective at removing...
Benchmark MEDIUM
Tianshi Li
On December 4, 2025, Anthropic released Anthropic Interviewer, an AI tool for running qualitative interviews at scale, along with a public dataset of...
4 months ago cs.CR cs.AI cs.CY
PDF
Benchmark MEDIUM
Zhi Yang, Runguo Li, Qiqi Qiang +15 more
Financial agents powered by large language models (LLMs) are increasingly deployed for investment analysis, risk assessment, and automated...
4 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Suyash Mishra, Qiang Li, Srikanth Patil +1 more
Vision Language Models (VLMs) are poised to revolutionize the digital transformation of pharmacyceutical industry by enabling intelligent, scalable,...
4 months ago cs.CV cs.LG
PDF
Benchmark MEDIUM
Konstantinos E. Kampourakis, Vyron Kampourakis, Efstratios Chatzoglou +2 more
Realistic, large-scale, and well-labeled cybersecurity datasets are essential for training and evaluating Intrusion Detection Systems (IDS). However,...
Benchmark LOW
Seyeon Jeong, Yeonjun Choi, JongWook Kim +1 more
Large Language Models (LLMs) suffer from hallucinations and factual inaccuracies, especially in complex reasoning and fact verification tasks....
Benchmark MEDIUM
Huawei Zheng, Xinqi Jiang, Sen Yang +3 more
Large language models (LLMs) are increasingly applied in specialized domains such as finance and healthcare, where they introduce unique safety...
4 months ago cs.CL cs.AI
PDF
Benchmark LOW
Jacob Ede Levine, Yun Lyan Luo, Sai Chandra Kosaraju
The design of reliable, valid, and diverse molecules is fundamental to modern drug discovery, as improved molecular generation supports efficient...
4 months ago cs.LG cs.AI
PDF
Benchmark LOW
Atharv Naphade
Retrieval-Augmented Generation (RAG) is the prevailing paradigm for grounding Large Language Models (LLMs), yet the mechanisms governing how models...
4 months ago cs.AI cs.LG
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial