Benchmark MEDIUM
Yu Lin, Qizhi Zhang, Wenqiang Ruan +6 more
The rapid development of large language models (LLMs) has driven the widespread adoption of cloud-based LLM inference services, while also bringing...
3 weeks ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Rahul Marchand, Art O Cathain, Jerome Wynne +5 more
Large language models (LLMs) increasingly act as autonomous agents, using tools to execute code, read and write files, and access networks, creating...
3 weeks ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Huajie Chen, Tianqing Zhu, Yuchen Zhong +7 more
Dataset distillation compresses a large real dataset into a small synthetic one, enabling models trained on the synthetic data to achieve performance...
3 weeks ago cs.CR cs.AI cs.LG
PDF
Benchmark MEDIUM
Haodong Zhao, Jinming Hu, Zhaomin Wu +7 more
Federated Instruction Tuning (FIT) enables collaborative instruction tuning of large language models across multiple organizations (clients) in a...
Benchmark MEDIUM
Om Tailor
Colluding language-model agents can hide coordination in messages that remain policy-compliant at the surface level. We present CLBC, a protocol...
3 weeks ago cs.CR cs.AI eess.SY
PDF
Benchmark MEDIUM
Chung-ju Huang, Huiqiang Zhao, Yuanpeng He +5 more
The increasing reliance on cloud-hosted Large Language Models (LLMs) exposes sensitive client data, such as prompts and responses, to potential...
3 weeks ago cs.CR cs.AI cs.CL
PDF
Benchmark MEDIUM
David Condrey
The proliferation of AI-generated text has intensified the need for reliable authorship verification, yet current output-based methods are...
3 weeks ago cs.CR cs.HC cs.LG
PDF
Benchmark MEDIUM
Nazanin Mohammadi Sepahvand, Eleni Triantafillou, Hugo Larochelle +3 more
Large language models (LLMs) trained on webscale data can produce toxic outputs, raising concerns for safe deployment. Prior defenses, based on...
Benchmark MEDIUM
Guangnian Wan, Qi Li, Gongfan Fang +2 more
Multimodal Diffusion Language Models (MDLMs) have recently emerged as a competitive alternative to their autoregressive counterparts. Yet their...
4 weeks ago cs.CR cs.LG
PDF
Benchmark MEDIUM
Longxiang Wang, Xiang Zheng, Xuhao Zhang +3 more
Multi-tenant LLM serving frameworks widely adopt shared Key-Value caches to enhance efficiency. However, this creates side-channel vulnerabilities...
4 weeks ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Lei Ba, Qinbin Li, Songze Li
LLM-based code interpreter agents are increasingly deployed in critical workflows, yet their robustness against risks introduced by their code...
Benchmark MEDIUM
Jingwei Shi, Xinxiang Yin, Jing Huang +2 more
The evaluation of Large Language Models (LLMs) for code generation relies heavily on the quality and robustness of test cases. However, existing...
1 months ago cs.SE cs.AI cs.CR
PDF
Benchmark MEDIUM
Abdullah Caglar Oksuz, Anisa Halimi, Erman Ayday
Membership inference attacks (MIAs) threaten the privacy of machine learning models by revealing whether a specific data point was used during...
1 months ago cs.LG cs.CR
PDF
Benchmark MEDIUM
Zachary Coalson, Bo Fang, Sanghyun Hong
Multi-turn interaction length is a dominant factor in the operational costs of conversational LLMs. In this work, we present a new failure mode in...
1 months ago cs.LG cs.CR
PDF
Benchmark MEDIUM
Gelei Deng, Yi Liu, Yuekang Li +5 more
LLM-based agents show promise for automating penetration testing, yet reported performance varies widely across systems and benchmarks. We analyze 28...
1 months ago cs.CR cs.SE
PDF
Benchmark MEDIUM
Simon Lermen, Daniel Paleka, Joshua Swanson +3 more
We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News...
1 months ago cs.CR cs.AI cs.LG
PDF
Benchmark MEDIUM
Michael Cunningham
We present a practical system for privacy-aware large language model (LLM) inference that splits a transformer between a trusted local GPU and an...
1 months ago cs.CR cs.DC
PDF
Benchmark MEDIUM
Nivya Talokar, Ayush K Tarun, Murari Mandal +2 more
LLM-based agents execute real-world workflows via tools and memory. These affordances enable ill-intended adversaries to also use these agents to...
1 months ago cs.CL cs.LG
PDF
Benchmark MEDIUM
Johannes Bertram, Jonas Geiping
We introduce NESSiE, the NEceSsary SafEty benchmark for large language models (LLMs). With minimal test cases of information and access security,...
1 months ago cs.CR cs.SE
PDF
Benchmark MEDIUM
Shahriar Golchin, Marc Wetter
We systematically evaluate the quality of widely used AI safety datasets from two perspectives: in isolation and in practice. In isolation, we...
1 months ago cs.CR cs.AI cs.CL
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial