Benchmark MEDIUM
He Yang Yuan, Xin Wang, Kundi Yao +3 more
Logging code plays an important role in software systems by recording key events and behaviors, which are essential for debugging and monitoring....
2 weeks ago cs.SE cs.AI cs.CR
PDF
Benchmark MEDIUM
Girish, Mohd Mujtaba Akhtar, Orchid Chetia Phukan +1 more
The rapid advancement of Audio Large Language Models (ALMs), driven by Neural Audio Codecs (NACs), has led to the emergence of highly realistic...
Benchmark MEDIUM
Robert Stanley, Avi Verma, Lillian Tsai +2 more
AI agents promise to serve as general-purpose personal assistants for their users, which requires them to have access to private user data (e.g.,...
3 weeks ago cs.CR cs.AI cs.OS
PDF
Benchmark MEDIUM
Alankrit Chona, Igor Kozlov, Ambuj Kumar
We introduce the Cyber Defense Benchmark, a benchmark for measuring how well large language model (LLM) agents perform the core SOC analyst task of...
3 weeks ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Alankrit Chona, Igor Kozlov, Ambuj Kumar
We introduce the Cyber Defense Benchmark, a benchmark for measuring how well large language model (LLM) agents perform the core SOC analyst task of...
3 weeks ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Ali Al-Kaswan, Maksim Plotnikov, Maxim Hájek +3 more
Large Language Model (LLM) agents are increasingly proposed for autonomous cybersecurity tasks, but their capabilities in realistic offensive...
3 weeks ago cs.AI cs.CR cs.SE
PDF
Benchmark MEDIUM
Ahson Saiyed, Sabrina Sadiekh, Chirag Agarwal
Large Language Models (LLMs) remain vulnerable to optimization-based jailbreak attacks that exploit internal gradient structure. While Sparse...
3 weeks ago cs.LG cs.AI cs.CL
PDF
Benchmark MEDIUM
Sina Abdollahi, Mohammad M Maheri, Javad Forough +5 more
Large Language Model (LLM) agents provide powerful automation capabilities, but they also create a substantially broader attack surface than...
3 weeks ago cs.CR cs.OS
PDF
Benchmark MEDIUM
Ziyao Tang, Pengkun Jiao, Bin Zhu +3 more
Video Large Language Models (Vid-LLMs) have demonstrated remarkable performance in video understanding tasks, yet their robustness under...
Benchmark MEDIUM
Shozo Saeki, Minoru Kawahara, Hirohisa Aman
A nearest-neighbor framework is a fundamental tool for various applications involving Large Language Models (LLMs) and Visual Language Models (VLMs)....
Benchmark MEDIUM
Yihao Zou, Tianming Zheng, Futai Zou +1 more
Fuzzing has become a widely adopted technique for vulnerability discovery, yet it remains ineffective for structured-input programs due to strict...
3 weeks ago cs.CR cs.PL
PDF
Benchmark MEDIUM
Dongwook Lee, Eunwoo Song, Che Hyun Lee +2 more
While recent Spoken Language Models (SLMs) have been actively deployed in real-world scenarios, they lack the capability to discern Third-Party...
3 weeks ago cs.CL cs.AI cs.SD
PDF
Benchmark MEDIUM
Rina Mishra, Gaurav Varshney, Doddipatla Sesha Sahithi
The rapid adoption of open-source Large Language Models (LLMs) in offline and enterprise environments has introduced a largely unexamined security...
Benchmark MEDIUM
Djiré Albérick Euraste, Kaboré Abdoul Kader, Jordan Samhi +3 more
The lack of transparency about code datasets used to train large language models (LLMs) makes it difficult to detect, evaluate, and mitigate data...
Benchmark MEDIUM
Xixun Lin, Yang Liu, Yancheng Chen +9 more
The performance of large language model (LLM) agents depends critically on the execution harness, the system layer that orchestrates tool use,...
3 weeks ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Prajas Wadekar, Venkata Sai Pranav Bachina, Kunal Bhosikar +2 more
3D Gaussian Splatting (3DGS) has recently enabled highly photorealistic 3D reconstruction from casually captured multi-view images. However, this...
4 weeks ago cs.CV cs.CR cs.LG
PDF
Benchmark MEDIUM
Joel Fokou
Autonomous AI agents are rapidly transitioning from experimental tools to operational infrastructure, with projections that 80% of enterprise...
4 weeks ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Miit Daga, Swarna Priya Ramu
Organisations increasingly outsource privacy-sensitive data transformations to cloud providers, yet no practical mechanism lets the data owner verify...
4 weeks ago cs.CR cs.DB cs.LG
PDF
Benchmark MEDIUM
Rui Yin, Tianxu Han, Naen Xu +8 more
Safety-aligned large language models (LLMs) are increasingly deployed in real-world pipelines, yet this deployment also enlarges the supply-chain...
4 weeks ago cs.CR cs.CL
PDF
Benchmark MEDIUM
Pei-Yu Tseng, Lan Zhang, ZihDwo Yeh +3 more
Cyber Threat Intelligence (CTI) reports contain Indicators of Compromise (IOCs) that are critical for security operations. To operationalize these...
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial