Benchmark HIGH
Xuhui Dou, Hayretdin Bahsi, Alejandro Guerra-Manzanares
Recent work applies Large Language Models (LLMs) to source-code vulnerability detection, but most evaluations still rely on random train-test splits...
2 months ago cs.CR cs.AI cs.LG
PDF
Benchmark HIGH
Mirae Kim, Seonghun Jeong, Youngjun Kwak
Jailbreaking poses a significant risk to the deployment of Large Language Models (LLMs) and Vision Language Models (VLMs). VLMs are particularly...
2 months ago cs.CL cs.AI cs.DB
PDF
Benchmark HIGH
Priyaranjan Pattnayak, Sanchari Chowdhuri
Safety alignment of large language models (LLMs) is mostly evaluated in English and contract-bound, leaving multilingual vulnerabilities...
2 months ago cs.AI cs.CL
PDF
Benchmark HIGH
Haoyu Li, Xijia Che, Yanhao Wang +2 more
Proof-of-Vulnerability (PoV) generation is a critical task in software security, serving as a cornerstone for vulnerability validation, false...
2 months ago cs.SE cs.CR
PDF
Benchmark HIGH
André Storhaug, Jiamou Sun, Jingyue Li
Identifying vulnerability-fixing commits corresponding to disclosed CVEs is essential for secure software maintenance but remains challenging at...
2 months ago cs.SE cs.AI cs.CR
PDF
Benchmark HIGH
Adriana Alvarado Garcia, Ruyuan Wan, Ozioma C. Oguine +1 more
Recently, red teaming, with roots in security, has become a key evaluative approach to ensure the safety and reliability of Generative Artificial...
3 months ago cs.CY cs.AI cs.CL
PDF
Benchmark HIGH
Chaeyun Kim, YongTaek Lim, Kihyun Kim +2 more
Existing red-teaming benchmarks, when adapted to new languages via direct translation, fail to capture socio-technical vulnerabilities rooted in...
3 months ago cs.CY cs.AI
PDF
Benchmark HIGH
Yuhang Wang, Feiming Xu, Zheng Lin +6 more
Although large language model (LLM)-based agents, exemplified by OpenClaw, are increasingly evolving from task-oriented systems into personalized AI...
Benchmark HIGH
Nanda Rani, Kimberly Milner, Minghao Shao +9 more
Real-world offensive security operations are inherently open-ended: attackers explore unknown attack surfaces, revise hypotheses under uncertainty,...
3 months ago cs.CR cs.AI cs.MA
PDF
Benchmark HIGH
Tianyi Wu, Mingzhe Du, Yue Liu +4 more
Large language models (LLMs) are increasingly used in software development, yet their tendency to generate insecure code remains a major barrier to...
3 months ago cs.CR cs.AI cs.CL
PDF
Benchmark HIGH
Li Lu, Yanjie Zhao, Hongzhou Rao +2 more
Large Language Models (LLMs) have demonstrated remarkable proficiency in vulnerability detection. However, a critical reliability gap persists:...
Benchmark HIGH
Junhyeok Lee, Han Jang, Kyu Sung Choi
Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems are increasingly integrated into clinical workflows; however, prompt...
3 months ago cs.CL cs.LG
PDF
Benchmark HIGH
Hao Li, Ruoyao Wen, Shanghao Shi +2 more
AI agents that autonomously interact with external tools and environments show great promise across real-world applications. However, the external...
Benchmark HIGH
Yunpeng Xiong, Ting Zhang
Static Application Security Testing (SAST) tools are essential for identifying software vulnerabilities, but they often produce a high volume of...
Benchmark HIGH
Ivan K. Tung, Yu Xiang Shi, Alex Chien +2 more
Creating attack paths for cyber defence exercises requires substantial expert effort. Existing automation requires vulnerability graphs or exploit...
3 months ago cs.CR cs.AI
PDF
Benchmark HIGH
Miao Lin, Feng Yu, Rui Ning +6 more
Deep neural networks are highly susceptible to backdoor attacks, yet most defense methods to date rely on balanced data, overlooking the pervasive...
3 months ago cs.CR cs.CV cs.LG
PDF
Benchmark HIGH
Thomas Heverin
Prompt injection evaluations typically treat refusal as a stable, binary indicator of safety. This study challenges that paradigm by modeling refusal...
Benchmark HIGH
Zelong Zheng, Jiayuan Zhou, Xing Hu +2 more
Software vulnerability management has become increasingly critical as modern systems scale in size and complexity. However, existing automated...
Benchmark HIGH
Fan Huang, Haewoon Kwak, Jisun An
Large Language Models (LLMs) are increasingly employed in various question-answering tasks. However, recent studies showcase that LLMs are...
3 months ago cs.CL cs.AI
PDF
Benchmark HIGH
Jiayi Yuan, Jonathan Nöther, Natasha Jaques +1 more
While recent automated red-teaming methods show promise for systematically exposing model vulnerabilities, most existing approaches rely on...
3 months ago cs.AI cs.NE
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial