Benchmark MEDIUM
Kartik Pandit, Sourav Ganguly, Arnesh Banerjee +2 more
Ensuring safety is a foundational requirement for large language models (LLMs). Achieving an appropriate balance between enhancing the utility of...
7 months ago cs.LG cs.AI eess.SY
PDF
Benchmark MEDIUM
Imene Kerboua, Sahar Omidi Shayegan, Megh Thakkar +7 more
Web agents powered by large language models (LLMs) must process lengthy web page observations to complete user goals; these pages often exceed tens...
Benchmark MEDIUM
Léo Boisvert, Abhay Puri, Chandra Kiran Reddy Evuru +6 more
While finetuning AI agents on interaction data -- such as web browsing or tool use -- improves their capabilities, it also introduces critical...
7 months ago cs.CR cs.AI cs.LG
PDF
Benchmark MEDIUM
Nikoo Naghavian, Mostafa Tavassolipour
Vision-language models like CLIP demonstrate impressive zero-shot generalization but remain highly vulnerable to adversarial attacks. In this work,...
Benchmark HIGH
Chengquan Guo, Chulin Xie, Yu Yang +6 more
Code agents have gained widespread adoption due to their strong code generation capabilities and integration with code interpreters, enabling dynamic...
Benchmark MEDIUM
Chenpei Huang, Lingfeng Yao, Hui Zhong +5 more
Ear canal scanning/sensing (ECS) has emerged as a novel biometric authentication method for mobile devices paired with wireless earbuds. Existing...
7 months ago cs.CR cs.HC
PDF
Benchmark LOW
Paschal C. Amusuo, Dongge Liu, Ricardo Andres Calvo Mendez +3 more
Fuzz testing has become a cornerstone technique for identifying software bugs and security vulnerabilities, with broad adoption in both industry and...
7 months ago cs.SE cs.CR cs.MA
PDF
Benchmark MEDIUM
Zhaoyan Wang, Zheng Gao, Arogya Kharel +1 more
Graph Neural Networks (GNNs) are widely adopted in Web-related applications, serving as a core technique for learning from graph-structured data,...
7 months ago cs.LG cs.AI
PDF
Benchmark LOW
Clara Maathuis, Kasper Cools
In a time of rapidly evolving military threats and increasingly complex operational environments, the integration of AI into military operations...
Benchmark MEDIUM
Luoxi Tang, Yuqiao Meng, Ankita Patra +3 more
Large Language Models (LLMs) are intensively used to assist security analysts in counteracting the rapid exploitation of cyber threats, wherein LLMs...
7 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Luca Cotti, Idilio Drago, Anisa Rula +2 more
System logs represent a valuable source of Cyber Threat Intelligence (CTI), capturing attacker behaviors, exploited vulnerabilities, and traces of...
Benchmark HIGH
Yinuo Liu, Ruohan Xu, Xilong Wang +2 more
Multiple prompt injection attacks have been proposed against web agents. At the same time, various methods have been developed to detect general...
7 months ago cs.CR cs.AI cs.CL
PDF
Benchmark LOW
Zhengliang Shi, Ruotian Ma, Jen-tse Huang +14 more
Large language models (LLMs) are increasingly entrusted with high-stakes decisions that affect human welfare. However, the principles and values that...
7 months ago cs.CL cs.AI cs.CY
PDF
Benchmark MEDIUM
Yicheng Lang, Yihua Zhang, Chongyu Fan +3 more
Large language model (LLM) unlearning aims to surgically remove the influence of undesired data or knowledge from an existing model while preserving...
Benchmark LOW
Chen-An Li, Tzu-Han Lin, Hung-yi Lee
Large audio-language models (LALMs) unify speech and text processing, but their robustness in noisy real-world settings remains underexplored. We...
7 months ago cs.SD cs.CL
PDF
Benchmark MEDIUM
Andrew Gan, Zahra Ghodsi
Machine learning systems increasingly rely on open-source artifacts such as datasets and models that are created or hosted by other parties. The...
Benchmark HIGH
Haoran Xi, Minghao Shao, Brendan Dolan-Gavitt +2 more
Large language models show promise for vulnerability discovery, yet prevailing methods inspect code in isolation, struggle with long contexts, and...
7 months ago cs.SE cs.CR cs.LG
PDF
Benchmark MEDIUM
Ehsan Aghaei, Sarthak Jain, Prashanth Arun +1 more
Effective analysis of cybersecurity and threat intelligence data demands language models that can interpret specialized terminology, complex document...
7 months ago cs.CR cs.AI cs.LG
PDF
Benchmark MEDIUM
Matheus Vinicius da Silva de Oliveira, Jonathan de Andrade Silva, Awdren de Lima Fontao
Large Language Models (LLMs) are widely used across multiple domains but continue to raise concerns regarding security and fairness. Beyond known...
7 months ago cs.AI cs.IR cs.LG
PDF
Benchmark LOW
Seiji Maekawa, Jackson Hassell, Pouya Pezeshkpour +2 more
Existing benchmarks for tool-augmented language models (TaLMs) lack fine-grained control over task difficulty and remain vulnerable to data...
7 months ago cs.CL cs.PL
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial