Benchmark LOW
Paschal C. Amusuo, Dongge Liu, Ricardo Andres Calvo Mendez +3 more
Fuzz testing has become a cornerstone technique for identifying software bugs and security vulnerabilities, with broad adoption in both industry and...
5 months ago cs.SE cs.CR cs.MA
PDF
Benchmark MEDIUM
Zhaoyan Wang, Zheng Gao, Arogya Kharel +1 more
Graph Neural Networks (GNNs) are widely adopted in Web-related applications, serving as a core technique for learning from graph-structured data,...
5 months ago cs.LG cs.AI
PDF
Benchmark LOW
Clara Maathuis, Kasper Cools
In a time of rapidly evolving military threats and increasingly complex operational environments, the integration of AI into military operations...
Benchmark MEDIUM
Luoxi Tang, Yuqiao Meng, Ankita Patra +3 more
Large Language Models (LLMs) are intensively used to assist security analysts in counteracting the rapid exploitation of cyber threats, wherein LLMs...
5 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Luca Cotti, Idilio Drago, Anisa Rula +2 more
System logs represent a valuable source of Cyber Threat Intelligence (CTI), capturing attacker behaviors, exploited vulnerabilities, and traces of...
Benchmark HIGH
Yinuo Liu, Ruohan Xu, Xilong Wang +2 more
Multiple prompt injection attacks have been proposed against web agents. At the same time, various methods have been developed to detect general...
5 months ago cs.CR cs.AI cs.CL
PDF
Benchmark LOW
Zhengliang Shi, Ruotian Ma, Jen-tse Huang +14 more
Large language models (LLMs) are increasingly entrusted with high-stakes decisions that affect human welfare. However, the principles and values that...
5 months ago cs.CL cs.AI cs.CY
PDF
Benchmark MEDIUM
Yicheng Lang, Yihua Zhang, Chongyu Fan +3 more
Large language model (LLM) unlearning aims to surgically remove the influence of undesired data or knowledge from an existing model while preserving...
Benchmark LOW
Chen-An Li, Tzu-Han Lin, Hung-yi Lee
Large audio-language models (LALMs) unify speech and text processing, but their robustness in noisy real-world settings remains underexplored. We...
5 months ago cs.SD cs.CL
PDF
Benchmark MEDIUM
Andrew Gan, Zahra Ghodsi
Machine learning systems increasingly rely on open-source artifacts such as datasets and models that are created or hosted by other parties. The...
Benchmark HIGH
Haoran Xi, Minghao Shao, Brendan Dolan-Gavitt +2 more
Large language models show promise for vulnerability discovery, yet prevailing methods inspect code in isolation, struggle with long contexts, and...
5 months ago cs.SE cs.CR cs.LG
PDF
Benchmark MEDIUM
Ehsan Aghaei, Sarthak Jain, Prashanth Arun +1 more
Effective analysis of cybersecurity and threat intelligence data demands language models that can interpret specialized terminology, complex document...
5 months ago cs.CR cs.AI cs.LG
PDF
Benchmark MEDIUM
Matheus Vinicius da Silva de Oliveira, Jonathan de Andrade Silva, Awdren de Lima Fontao
Large Language Models (LLMs) are widely used across multiple domains but continue to raise concerns regarding security and fairness. Beyond known...
5 months ago cs.AI cs.IR cs.LG
PDF
Benchmark LOW
Seiji Maekawa, Jackson Hassell, Pouya Pezeshkpour +2 more
Existing benchmarks for tool-augmented language models (TaLMs) lack fine-grained control over task difficulty and remain vulnerable to data...
5 months ago cs.CL cs.PL
PDF
Benchmark LOW
Yao Tong, Haonan Wang, Siquan Li +2 more
Fingerprinting Large Language Models (LLMs) is essential for provenance verification and model attribution. Existing methods typically extract...
5 months ago cs.CR cs.AI cs.CL
PDF
Benchmark LOW
Joel Dyer, Daniel Jarne Ornia, Nicholas Bishop +2 more
Evaluating the safety of frontier AI systems is an increasingly important concern, helping to measure the capabilities of such models and identify...
5 months ago cs.LG cs.AI stat.ML
PDF
Benchmark LOW
Yixu Wang, Xin Wang, Yang Yao +4 more
The rapid integration of Large Language Models (LLMs) into high-stakes domains necessitates reliable safety and compliance evaluation. However,...
Benchmark HIGH
Simin Chen, Yixin He, Suman Jana +1 more
LLM-based agents are increasingly deployed for software maintenance tasks such as automated program repair (APR). APR agents automatically fetch...
Benchmark LOW
Ruolin Chen, Yinqian Sun, Jihang Wang +3 more
Embodied agents powered by large language models (LLMs) inherit advanced planning capabilities; however, their direct interaction with the physical...
Benchmark LOW
Xiang Zhang, Kun Wei, Xu Yang +3 more
As Large Language Models (LLMs) become increasingly prevalent, their security vulnerabilities have already drawn attention. Machine unlearning is...
5 months ago cs.LG cs.CL
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial