Benchmark HIGH
Kaiwen Zhou, Ahmed Elgohary, A S M Iftekhar +1 more
The ability of LLM agents to plan and invoke tools exposes them to new safety risks, making a comprehensive red-teaming system crucial for...
4 months ago cs.CR cs.AI cs.CL
PDF
Benchmark HIGH
Euodia Dodd, Nataša Krčo, Igor Shilov +1 more
Membership inference attacks (MIAs) have emerged as the standard tool for evaluating the privacy risks of AI models. However, state-of-the-art...
5 months ago cs.LG cs.CR
PDF
Benchmark HIGH
Osama Al Haddad, Muhammad Ikram, Ejaz Ahmed +1 more
Security analysts face increasing pressure to triage large and complex vulnerability backlogs. Large Language Models (LLMs) offer a potential aid by...
Benchmark HIGH
Pranshav Gajjar, Molham Khoja, Abiodun Ganiyu +4 more
The impending adoption of Open Radio Access Network (O-RAN) is fueling innovation in the RAN towards data-driven operation. Unlike traditional RAN...
5 months ago cs.CR cs.NI
PDF
Benchmark HIGH
Chengquan Guo, Yuzhou Nie, Chulin Xie +3 more
As large language models (LLMs) are increasingly used for code generation, concerns over the security risks have grown substantially. Early research...
Benchmark HIGH
Bin Liu, Yanjie Zhao, Guoai Xu +1 more
Large language model (LLM) agents have demonstrated remarkable capabilities in software engineering and cybersecurity tasks, including code...
5 months ago cs.SE cs.CR
PDF
Benchmark HIGH
Trilok Padhi, Pinxian Lu, Abdulkadir Erol +5 more
Large Language Model (LLM) agents are powering a growing share of interactive web applications, yet remain vulnerable to misuse and harm. Prior...
Benchmark HIGH
Ivan Dubrovsky, Anastasia Orlova, Illarion Iov +3 more
Benchmarking outcomes increasingly govern trust, selection, and deployment of LLMs, yet these evaluations remain vulnerable to semantically...
Benchmark HIGH
Dongsen Zhang, Zekun Li, Xu Luo +3 more
The Model Context Protocol (MCP) standardizes how large language model (LLM) agents discover, describe, and call external tools. While MCP unlocks...
5 months ago cs.CR cs.AI
PDF
Benchmark HIGH
Haoran Ou, Kangjie Chen, Xingshuo Han +4 more
Large Language Models (LLMs) have been augmented with web search to overcome the limitations of the static knowledge boundary by accessing up-to-date...
5 months ago cs.CR cs.AI
PDF
Benchmark HIGH
Rishika Bhagwatkar, Kevin Kasa, Abhay Puri +5 more
AI agents are vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external content or tool outputs cause...
Benchmark HIGH
Rishika Bhagwatkar, Kevin Kasa, Abhay Puri +5 more
AI agents are vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external content or tool outputs cause...
Benchmark HIGH
Chengquan Guo, Chulin Xie, Yu Yang +6 more
Code agents have gained widespread adoption due to their strong code generation capabilities and integration with code interpreters, enabling dynamic...
Benchmark HIGH
Yinuo Liu, Ruohan Xu, Xilong Wang +2 more
Multiple prompt injection attacks have been proposed against web agents. At the same time, various methods have been developed to detect general...
5 months ago cs.CR cs.AI cs.CL
PDF
Benchmark HIGH
Haoran Xi, Minghao Shao, Brendan Dolan-Gavitt +2 more
Large language models show promise for vulnerability discovery, yet prevailing methods inspect code in isolation, struggle with long contexts, and...
5 months ago cs.SE cs.CR cs.LG
PDF
Benchmark HIGH
Simin Chen, Yixin He, Suman Jana +1 more
LLM-based agents are increasingly deployed for software maintenance tasks such as automated program repair (APR). APR agents automatically fetch...
Benchmark HIGH
Alireza Lotfi, Charalampos Katsis, Elisa Bertino
Software vulnerabilities remain a critical security challenge, providing entry points for attackers into enterprise networks. Despite advances in...
Benchmark HIGH
Jianshuo Dong, Sheng Guo, Hao Wang +6 more
Search agents connect LLMs to the Internet, enabling them to access broader and more up-to-date information. However, this also introduces a new...
5 months ago cs.AI cs.CL cs.CR
PDF
Benchmark HIGH
Wenkai Guo, Xuefeng Liu, Haolin Wang +3 more
Fine-tuning large language models (LLMs) with local data is a widely adopted approach for organizations seeking to adapt LLMs to their specific...
6 months ago cs.LG cs.CL cs.CR
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial