Benchmark MEDIUM
Chung-ju Huang, Huiqiang Zhao, Yuanpeng He +5 more
The increasing reliance on cloud-hosted Large Language Models (LLMs) exposes sensitive client data, such as prompts and responses, to potential...
2 months ago cs.CR cs.AI cs.CL
PDF
Benchmark MEDIUM
David Condrey
The proliferation of AI-generated text has intensified the need for reliable authorship verification, yet current output-based methods are...
2 months ago cs.CR cs.HC cs.LG
PDF
Benchmark LOW
Zhengqing Yuan, Kaiwen Shi, Zheyuan Zhang +3 more
Scientific research relies on accurate citation for attribution and integrity, yet large language models (LLMs) introduce a new risk: fabricated...
2 months ago cs.CL cs.DL
PDF
Benchmark LOW
Yuan Liang, Ruobin Zhong, Haoming Xu +46 more
Current AI agents can flexibly invoke tools and execute complex tasks, yet their long-term advancement is hindered by the lack of systematic...
2 months ago cs.AI cs.CL cs.CV
PDF
Benchmark LOW
Jiazheng Quan, Xiaodong Li, Bin Wang +5 more
Large language models (LLMs) have demonstrated strong capabilities in code generation, yet they remain prone to producing security vulnerabilities....
2 months ago cs.CR cs.AI cs.SE
PDF
Benchmark LOW
Balazs Pejo
Federated learning offers a privacy-friendly collaborative learning framework, yet its success, like any joint venture, hinges on the contributions...
2 months ago cs.LG cs.CR
PDF
Benchmark LOW
Vladimer Khasia
The pursuit of world model based artificial intelligence has predominantly relied on projecting high-dimensional observations into parameterized...
Benchmark MEDIUM
Nazanin Mohammadi Sepahvand, Eleni Triantafillou, Hugo Larochelle +3 more
Large language models (LLMs) trained on webscale data can produce toxic outputs, raising concerns for safe deployment. Prior defenses, based on...
Benchmark LOW
Mohammed Cherifi
Public EV charging infrastructure suffers from significant failure rates -- with field studies reporting up to 27.5% of DC fast chargers...
2 months ago cs.DC cs.AI cs.LG
PDF
Benchmark MEDIUM
Guangnian Wan, Qi Li, Gongfan Fang +2 more
Multimodal Diffusion Language Models (MDLMs) have recently emerged as a competitive alternative to their autoregressive counterparts. Yet their...
2 months ago cs.CR cs.LG
PDF
Benchmark MEDIUM
Longxiang Wang, Xiang Zheng, Xuhao Zhang +3 more
Multi-tenant LLM serving frameworks widely adopt shared Key-Value caches to enhance efficiency. However, this creates side-channel vulnerabilities...
2 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Lei Ba, Qinbin Li, Songze Li
LLM-based code interpreter agents are increasingly deployed in critical workflows, yet their robustness against risks introduced by their code...
Benchmark MEDIUM
Jingwei Shi, Xinxiang Yin, Jing Huang +2 more
The evaluation of Large Language Models (LLMs) for code generation relies heavily on the quality and robustness of test cases. However, existing...
2 months ago cs.SE cs.AI cs.CR
PDF
Benchmark MEDIUM
Abdullah Caglar Oksuz, Anisa Halimi, Erman Ayday
Membership inference attacks (MIAs) threaten the privacy of machine learning models by revealing whether a specific data point was used during...
2 months ago cs.LG cs.CR
PDF
Benchmark LOW
Martin Bertran, Riccardo Fogliato, Zhiwei Steven Wu
Empirical conclusions depend not only on data but on analytic decisions made throughout the research process. Many-analyst studies have quantified...
2 months ago cs.AI cs.LG
PDF
Benchmark HIGH
Mirae Kim, Seonghun Jeong, Youngjun Kwak
Jailbreaking poses a significant risk to the deployment of Large Language Models (LLMs) and Vision Language Models (VLMs). VLMs are particularly...
2 months ago cs.CL cs.AI cs.DB
PDF
Benchmark LOW
Anna Babarczy, Andras Lukacs, Peter Vedres +1 more
The study explores whether current Large Language Models (LLMs) exhibit Theory of Mind (ToM) capabilities -- specifically, the ability to infer...
2 months ago cs.CL cs.AI
PDF
Benchmark MEDIUM
Zachary Coalson, Bo Fang, Sanghyun Hong
Multi-turn interaction length is a dominant factor in the operational costs of conversational LLMs. In this work, we present a new failure mode in...
2 months ago cs.LG cs.CR
PDF
Benchmark MEDIUM
Gelei Deng, Yi Liu, Yuekang Li +5 more
LLM-based agents show promise for automating penetration testing, yet reported performance varies widely across systems and benchmarks. We analyze 28...
2 months ago cs.CR cs.SE
PDF
Benchmark LOW
Takyoung Kim, Jinseok Nam, Chandrayee Basu +5 more
Conversational agents powered by large language models (LLMs) with tool integration achieve strong performance on fixed task-oriented dialogue...
2 months ago cs.CL cs.AI
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial