Benchmark MEDIUM
Vladyslav Larin, Ihor Naumenko, Aleksei Ivashov +2 more
As centralized AI hits compute ceilings and diminishing returns from ever-larger training runs, meeting demand requires an inference layer that...
4 months ago cs.LG cs.AI cs.CL
PDF
Benchmark MEDIUM
Hiromu Takahashi, Shotaro Ishihara
We propose Fast-MIA (https://github.com/Nikkei/fast-mia), a Python library for efficiently evaluating membership inference attacks (MIA) against...
4 months ago cs.CR cs.CL
PDF
Benchmark MEDIUM
Armin Gerami, Kazem Faghih, Ramani Duraiswami
Retrieval Augmented Generation (RAG) enhances Large Language Models (LLMs) by connecting them to external knowledge, improving accuracy and reducing...
4 months ago cs.IR cs.AI cs.CL
PDF
Benchmark MEDIUM
Julia Bazinska, Max Mathys, Francesco Casucci +4 more
AI agents powered by large language models (LLMs) are being deployed at scale, yet we lack a systematic understanding of how the choice of backbone...
5 months ago cs.CR cs.AI cs.LG
PDF
Benchmark MEDIUM
Hao Zheng, Zirui Pang, Ling li +5 more
Advances in Multimodal Large Language Models (MLLMs) intensify concerns about data privacy, making Machine Unlearning (MU), the selective removal of...
5 months ago cs.AI cs.CL
PDF
Benchmark MEDIUM
Mojtaba Eshghie, Gabriele Morello, Matteo Lauretano +2 more
Smart contract vulnerabilities cost billions of dollars annually, yet existing automated analysis tools fail to generate deployable defenses. We...
5 months ago cs.CR cs.SE
PDF
Benchmark MEDIUM
Christoph Bühler, Matteo Biagiola, Luca Di Grazia +1 more
Large Language Models (LLMs) have evolved into AI agents that interact with external tools and environments to perform complex tasks. The Model...
5 months ago cs.CR cs.AI cs.SE
PDF
Benchmark MEDIUM
Divyanshu Kumar, Nitin Aravind Birur, Tanay Baswa +2 more
Frontier Large Language Models (LLMs) pose unprecedented dual-use risks through the potential proliferation of chemical, biological, radiological,...
5 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Chengcan Wu, Zhixin Zhang, Mingqian Xu +2 more
Large Language Model (LLM)-based Multi-Agent Systems (MAS) have become a popular paradigm of AI applications. However, trustworthiness issues in MAS...
5 months ago cs.CR cs.AI cs.LG
PDF
Benchmark MEDIUM
Alexander Nemecek, Zebin Yun, Zahra Rahmani +4 more
As large language models (LLMs) become progressively more embedded in clinical decision-support, documentation, and patient-information systems,...
5 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Marco Alecci, Jordan Samhi, Tegawendé F. Bissyandé +1 more
Mobile apps often embed authentication secrets, such as API keys, tokens, and client IDs, to integrate with cloud services. However, developers often...
5 months ago cs.CR cs.SE
PDF
Benchmark MEDIUM
Giovanni De Muri, Mark Vero, Robin Staab +1 more
LLMs are often used by downstream users as teacher models for knowledge distillation, compressing their capabilities into memory-efficient models....
5 months ago cs.LG cs.AI cs.CR
PDF
Benchmark MEDIUM
Yixuan Liu, Xinlei Li, Yi Li
Phishing attacks in Web3 ecosystems are increasingly sophisticated, exploiting deceptive contract logic, malicious frontend scripts, and token...
Benchmark MEDIUM
Shivam Ratnakar, Sanjay Raghavendra
Integration of Large Language Models with search/retrieval engines has become ubiquitous, yet these systems harbor a critical vulnerability that...
5 months ago cs.CL cs.AI
PDF
Benchmark MEDIUM
David Peer, Sebastian Stabinger
Large Language Models (LLMs) have demonstrated impressive capabilities, yet their deployment in high-stakes domains is hindered by inherent...
5 months ago cs.CL cs.AI
PDF
Benchmark MEDIUM
Shuai Li, Kejiang Chen, Jun Jiang +5 more
Large Language Models (LLMs) have demonstrated remarkable capabilities, but their training requires extensive data and computational resources,...
Benchmark MEDIUM
Qiushi Wu, Yue Xiao, Dhilung Kirat +3 more
Fixing bugs in large programs is a challenging task that demands substantial time and effort. Once a bug is found, it is reported to the project...
5 months ago cs.SE cs.AI
PDF
Benchmark MEDIUM
Yibo Peng, James Song, Lei Li +6 more
Code agents are increasingly trusted to autonomously fix bugs on platforms such as GitHub, yet their security evaluation focuses almost exclusively...
5 months ago cs.CR cs.SE
PDF
Benchmark MEDIUM
Jonghyun Park, Minhyuk Seo, Jonghyun Choi
One of the key challenges of modern AI models is ensuring that they provide helpful responses to benign queries while refusing malicious ones. But...
Benchmark MEDIUM
Xin Zhao, Xiaojun Chen, Bingshan Liu +3 more
Large language models (LLMs) with Mixture-of-Experts (MoE) architectures achieve impressive performance and efficiency by dynamically routing inputs...
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial