Benchmark HIGH
Euntae Kim, Soomin Han, Buru Chang
Large language models (LLMs) are increasingly used as co-authors in collaborative writing, where users begin with rough drafts and rely on LLMs to...
Benchmark MEDIUM
Ahson Saiyed, Sabrina Sadiekh, Chirag Agarwal
Large Language Models (LLMs) remain vulnerable to optimization-based jailbreak attacks that exploit internal gradient structure. While Sparse...
3 weeks ago cs.LG cs.AI cs.CL
PDF
Benchmark MEDIUM
Sina Abdollahi, Mohammad M Maheri, Javad Forough +5 more
Large Language Model (LLM) agents provide powerful automation capabilities, but they also create a substantially broader attack surface than...
3 weeks ago cs.CR cs.OS
PDF
Benchmark LOW
Yanzhen Lu, Muchen Jiang, Zhicheng Qian +1 more
Prompt-injected memory can improve reasoning without updating model weights, but it also creates a control problem: retrieved content helps only when...
Benchmark LOW
Sua Lee, Sanghee Park, Jinbae Im
Multimodal Large Language Models (MLLMs) have been increasingly used as automatic evaluators-a paradigm known as MLLM-as-a-Judge. However, their...
3 weeks ago cs.CL cs.AI cs.CV
PDF
Benchmark HIGH
Parteek Jamwal, Minghao Shao, Boyuan Chen +15 more
Large Language Models (LLMs) have demonstrated remarkable capabilities across various cybersecurity tasks, including vulnerability classification,...
3 weeks ago cs.CR cs.AI cs.MA
PDF
Benchmark MEDIUM
Ziyao Tang, Pengkun Jiao, Bin Zhu +3 more
Video Large Language Models (Vid-LLMs) have demonstrated remarkable performance in video understanding tasks, yet their robustness under...
Benchmark MEDIUM
Shozo Saeki, Minoru Kawahara, Hirohisa Aman
A nearest-neighbor framework is a fundamental tool for various applications involving Large Language Models (LLMs) and Visual Language Models (VLMs)....
Benchmark MEDIUM
Yihao Zou, Tianming Zheng, Futai Zou +1 more
Fuzzing has become a widely adopted technique for vulnerability discovery, yet it remains ineffective for structured-input programs due to strict...
3 weeks ago cs.CR cs.PL
PDF
Benchmark LOW
Khang Tran, Khoa Nguyen, Cristian Borcea +1 more
Recent advances in large language models for test case generation have improved branch coverage via prompt-engineered mutations. However, they still...
3 weeks ago cs.SE cs.LG
PDF
Benchmark HIGH
Ivan Bercovich, Ivgeni Segal, Kexun Zhang +3 more
We release Terminal Wrench, a subset of 331 terminal-agent benchmark environments, copied from the popular open benchmarks that are demonstrably...
3 weeks ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Dongwook Lee, Eunwoo Song, Che Hyun Lee +2 more
While recent Spoken Language Models (SLMs) have been actively deployed in real-world scenarios, they lack the capability to discern Third-Party...
3 weeks ago cs.CL cs.AI cs.SD
PDF
Benchmark MEDIUM
Rina Mishra, Gaurav Varshney, Doddipatla Sesha Sahithi
The rapid adoption of open-source Large Language Models (LLMs) in offline and enterprise environments has introduced a largely unexamined security...
Benchmark LOW
Madhav Agarwal, Sotirios A. Tsaftaris, Laura Sevilla-Lara +1 more
Understanding emotions is a fundamental ability for intelligent systems to be able to interact with humans. Vision-language models (VLMs) have made...
3 weeks ago cs.CV cs.AI
PDF
Benchmark MEDIUM
Djiré Albérick Euraste, Kaboré Abdoul Kader, Jordan Samhi +3 more
The lack of transparency about code datasets used to train large language models (LLMs) makes it difficult to detect, evaluate, and mitigate data...
Benchmark MEDIUM
Xixun Lin, Yang Liu, Yancheng Chen +9 more
The performance of large language model (LLM) agents depends critically on the execution harness, the system layer that orchestrates tool use,...
3 weeks ago cs.CR cs.AI
PDF
Benchmark LOW
Eun Woo Im, Dhruv Madhwal, Vivek Gupta
Vision-Language Models demonstrate remarkable capabilities but often struggle with compositional reasoning, exhibiting vulnerabilities regarding word...
Benchmark MEDIUM
Prajas Wadekar, Venkata Sai Pranav Bachina, Kunal Bhosikar +2 more
3D Gaussian Splatting (3DGS) has recently enabled highly photorealistic 3D reconstruction from casually captured multi-view images. However, this...
4 weeks ago cs.CV cs.CR cs.LG
PDF
Benchmark MEDIUM
Joel Fokou
Autonomous AI agents are rapidly transitioning from experimental tools to operational infrastructure, with projections that 80% of enterprise...
4 weeks ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Miit Daga, Swarna Priya Ramu
Organisations increasingly outsource privacy-sensitive data transformations to cloud providers, yet no practical mechanism lets the data owner verify...
4 weeks ago cs.CR cs.DB cs.LG
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial