Benchmark LOW
Seyeon Jeong, Yeonjun Choi, JongWook Kim +1 more
Large Language Models (LLMs) suffer from hallucinations and factual inaccuracies, especially in complex reasoning and fact verification tasks....
Benchmark MEDIUM
Huawei Zheng, Xinqi Jiang, Sen Yang +3 more
Large language models (LLMs) are increasingly applied in specialized domains such as finance and healthcare, where they introduce unique safety...
4 months ago cs.CL cs.AI
PDF
Benchmark LOW
Jacob Ede Levine, Yun Lyan Luo, Sai Chandra Kosaraju
The design of reliable, valid, and diverse molecules is fundamental to modern drug discovery, as improved molecular generation supports efficient...
4 months ago cs.LG cs.AI
PDF
Benchmark LOW
Atharv Naphade
Retrieval-Augmented Generation (RAG) is the prevailing paradigm for grounding Large Language Models (LLMs), yet the mechanisms governing how models...
4 months ago cs.AI cs.LG
PDF
Benchmark LOW
Xinyue Lou, Jinan Xu, Jingyi Yin +8 more
As Multimodal Large Language Models (MLLMs) become an indispensable assistant in human life, the unsafe content generated by MLLMs poses a danger to...
Benchmark LOW
Haeun Jang, Hwan Chang, Hwanhee Lee
The deployment of Large Vision-Language Models (LVLMs) for real-world document question answering is often constrained by dynamic, user-defined...
Benchmark MEDIUM
Xiaoyu Xu, Minxin Du, Zitong Li +6 more
Although machine unlearning is essential for removing private, harmful, or copyrighted content from LLMs, current benchmarks often fail to faithfully...
4 months ago cs.CL cs.AI cs.CR
PDF
Benchmark MEDIUM
Dinesh Srivasthav P, Ashok Urlana, Rahul Mishra +2 more
Machine unlearning aims to selectively remove the influence of specific training samples to satisfy privacy regulations such as the GDPR's 'Right to...
4 months ago cs.CR cs.AI cs.CL
PDF
Benchmark LOW
Jin Wang, Liang Lin, Kaiwen Luo +8 more
While Audio Large Language Models (ALLMs) have achieved remarkable progress in understanding and generation, their potential privacy implications...
Benchmark HIGH
Quy-Anh Dang, Chris Ngo, Truong-Son Hy
As large language models (LLMs) become integral to safety-critical applications, ensuring their robustness against adversarial prompts is paramount....
Benchmark HIGH
Zejian Chen, Chaozhuo Li, Chao Li +3 more
This paper provides a systematic survey of jailbreak attacks and defenses on Large Language Models (LLMs) and Vision-Language Models (VLMs),...
Benchmark LOW
Shidong Cao, Hongzhan Lin, Yuxuan Gu +2 more
Chain-of-Thought (CoT) reasoning improves multi-step mathematical problem solving in large language models but remains vulnerable to exposure bias...
Benchmark HIGH
Xiangzhe Yuan, Zhenhao Zhang, Haoming Tang +1 more
As LLMs gain persuasive agentic capabilities through extended dialogues, they introduce novel risks in multi-turn conversational scams that...
Benchmark LOW
Nick Pepper, Adam Keane, Amy Hodgkin +11 more
This paper presents the first probabilistic Digital Twin of operational en route airspace, developed for the London Area Control Centre. The Digital...
Benchmark LOW
Chengcheng Feng, Haojie Yin, Yucheng Jin +1 more
Comic-based visual question answering (CVQA) poses distinct challenges to multimodal large language models (MLLMs) due to its reliance on symbolic...
4 months ago cs.CV cs.AI
PDF
Benchmark LOW
Sunny Gupta, Shounak Das, Amit Sethi
Vision language foundation models such as CLIP exhibit impressive zero-shot generalization yet remain vulnerable to spurious correlations across...
4 months ago cs.CV cs.AI cs.LG
PDF
Benchmark MEDIUM
Antonio Colacicco, Vito Guida, Dario Di Palma +2 more
Large Language Models (LLMs) are increasingly applied in recommendation scenarios due to their strong natural language understanding and generation...
4 months ago cs.IR cs.AI cs.CL
PDF
Benchmark LOW
Bin Xu
AI agents -- systems that combine foundation models with reasoning, planning, memory, and tool use -- are rapidly becoming a practical interface...
Benchmark MEDIUM
Jinwei Hu, Xinmiao Huang, Youcheng Sun +2 more
As large language models (LLMs) transition to autonomous agents synthesizing real-time information, their reasoning capabilities introduce an...
4 months ago cs.CL cs.AI cs.MA
PDF
Benchmark MEDIUM
Junyu Liu, Zirui Li, Qian Niu +7 more
As Large Language Models (LLMs) are increasingly deployed in healthcare field, it becomes essential to carefully evaluate their medical safety before...
4 months ago cs.CL cs.AI
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial