Benchmark LOW
Xinyue Lou, Jinan Xu, Jingyi Yin +8 more
As Multimodal Large Language Models (MLLMs) become an indispensable assistant in human life, the unsafe content generated by MLLMs poses a danger to...
Benchmark LOW
Haeun Jang, Hwan Chang, Hwanhee Lee
The deployment of Large Vision-Language Models (LVLMs) for real-world document question answering is often constrained by dynamic, user-defined...
Benchmark MEDIUM
Xiaoyu Xu, Minxin Du, Zitong Li +6 more
Although machine unlearning is essential for removing private, harmful, or copyrighted content from LLMs, current benchmarks often fail to faithfully...
4 months ago cs.CL cs.AI cs.CR
PDF
Benchmark MEDIUM
Dinesh Srivasthav P, Ashok Urlana, Rahul Mishra +2 more
Machine unlearning aims to selectively remove the influence of specific training samples to satisfy privacy regulations such as the GDPR's 'Right to...
4 months ago cs.CR cs.AI cs.CL
PDF
Benchmark LOW
Jin Wang, Liang Lin, Kaiwen Luo +8 more
While Audio Large Language Models (ALLMs) have achieved remarkable progress in understanding and generation, their potential privacy implications...
Benchmark HIGH
Quy-Anh Dang, Chris Ngo, Truong-Son Hy
As large language models (LLMs) become integral to safety-critical applications, ensuring their robustness against adversarial prompts is paramount....
Benchmark HIGH
Zejian Chen, Chaozhuo Li, Chao Li +3 more
This paper provides a systematic survey of jailbreak attacks and defenses on Large Language Models (LLMs) and Vision-Language Models (VLMs),...
Benchmark LOW
Shidong Cao, Hongzhan Lin, Yuxuan Gu +2 more
Chain-of-Thought (CoT) reasoning improves multi-step mathematical problem solving in large language models but remains vulnerable to exposure bias...
Benchmark HIGH
Xiangzhe Yuan, Zhenhao Zhang, Haoming Tang +1 more
As LLMs gain persuasive agentic capabilities through extended dialogues, they introduce novel risks in multi-turn conversational scams that...
Benchmark LOW
Nick Pepper, Adam Keane, Amy Hodgkin +11 more
This paper presents the first probabilistic Digital Twin of operational en route airspace, developed for the London Area Control Centre. The Digital...
Benchmark LOW
Chengcheng Feng, Haojie Yin, Yucheng Jin +1 more
Comic-based visual question answering (CVQA) poses distinct challenges to multimodal large language models (MLLMs) due to its reliance on symbolic...
4 months ago cs.CV cs.AI
PDF
Benchmark LOW
Sunny Gupta, Shounak Das, Amit Sethi
Vision language foundation models such as CLIP exhibit impressive zero-shot generalization yet remain vulnerable to spurious correlations across...
4 months ago cs.CV cs.AI cs.LG
PDF
Benchmark MEDIUM
Antonio Colacicco, Vito Guida, Dario Di Palma +2 more
Large Language Models (LLMs) are increasingly applied in recommendation scenarios due to their strong natural language understanding and generation...
4 months ago cs.IR cs.AI cs.CL
PDF
Benchmark LOW
Bin Xu
AI agents -- systems that combine foundation models with reasoning, planning, memory, and tool use -- are rapidly becoming a practical interface...
Benchmark MEDIUM
Jinwei Hu, Xinmiao Huang, Youcheng Sun +2 more
As large language models (LLMs) transition to autonomous agents synthesizing real-time information, their reasoning capabilities introduce an...
4 months ago cs.CL cs.AI cs.MA
PDF
Benchmark MEDIUM
Junyu Liu, Zirui Li, Qian Niu +7 more
As Large Language Models (LLMs) are increasingly deployed in healthcare field, it becomes essential to carefully evaluate their medical safety before...
4 months ago cs.CL cs.AI
PDF
Benchmark HIGH
Songyang Liu, Chaozhuo Li, Rui Pu +5 more
Jailbreak attacks present a significant challenge to the safety of Large Language Models (LLMs), yet current automated evaluation methods largely...
4 months ago cs.CR cs.CL
PDF
Benchmark MEDIUM
Muntasir Adnan, Carlos C. N. Kuhn
Large Language Models have become integral to software development, yet they frequently generate vulnerable code. Existing code vulnerability...
4 months ago cs.SE cs.AI
PDF
Benchmark MEDIUM
Zhuoran Tan, Run Hao, Jeremy Singer +2 more
Tool-augmented LLM agents raise new security risks: tool executions can introduce runtime-only behaviors, including prompt injection and unintended...
4 months ago cs.CR cs.SE
PDF
Benchmark MEDIUM
Milad Rahmati, Nima Rahmati
The proliferation of Internet of Things devices in critical infrastructure has created unprecedented cybersecurity challenges, necessitating...
4 months ago cs.CR cs.LG
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial