Benchmark HIGH
Sen Fang, Weiyuan Ding, Zhezhen Cao +2 more
Large Language Models (LLMs) are increasingly adopted for vulnerability detection, yet their reasoning remains fundamentally unsound. We identify a...
4 days ago cs.SE cs.AI cs.CR
PDF
Benchmark HIGH
Iakovos-Christos Zarkadis, Christos Douligeris
Supervised detection of network attacks has always been a critical part of network intrusion detection systems (NIDS). Nowadays, in a pivotal time...
1 weeks ago cs.CR cs.AI stat.AP
PDF
Benchmark HIGH
Lidor Erez, Omer Hofman, Tamir Nizri +1 more
Automated LLM vulnerability scanners are increasingly used to assess security risks by measuring different attack type success rates (ASR). Yet the...
1 weeks ago cs.CR cs.PF
PDF
Benchmark HIGH
Siddharth Srikanth, Freddie Liang, Sophie Hsu +9 more
Vision-Language-Action (VLA) models have significant potential to enable general-purpose robotic systems for a range of vision-language tasks....
1 weeks ago cs.RO cs.AI cs.CL
PDF
Benchmark HIGH
Zheng Yu, Wenxuan Shi, Xinqian Sun +3 more
Automated Vulnerability Repair (AVR) systems, especially those leveraging large language models (LLMs), have demonstrated promising results in...
Benchmark HIGH
Zheng Yu, Wenxuan Shi, Xinqian Sun +3 more
Automated Vulnerability Repair (AVR) systems, especially those leveraging large language models (LLMs), have demonstrated promising results in...
Benchmark HIGH
Masahiro Kaneko, Ayana Niwa, Timothy Baldwin
Fake news undermines societal trust and decision-making across politics, economics, health, and international relations, and in extreme cases...
3 weeks ago cs.LG cs.CL
PDF
Benchmark HIGH
Mingcheng Jiang, Jiancheng Huang, Jiangfei Wang +5 more
Static Application Security Testing (SAST) tools often suffer from high false positive rates, leading to alert fatigue that consumes valuable...
Benchmark HIGH
Zhicheng Fang, Jingjie Zheng, Chenxu Fu +1 more
Jailbreak techniques for large language models (LLMs) evolve faster than benchmarks, making robustness estimates stale and difficult to compare...
3 weeks ago cs.CR cs.AI cs.CL
PDF
Benchmark HIGH
Xuhui Dou, Hayretdin Bahsi, Alejandro Guerra-Manzanares
Recent work applies Large Language Models (LLMs) to source-code vulnerability detection, but most evaluations still rely on random train-test splits...
3 weeks ago cs.CR cs.AI cs.LG
PDF
Benchmark HIGH
Mirae Kim, Seonghun Jeong, Youngjun Kwak
Jailbreaking poses a significant risk to the deployment of Large Language Models (LLMs) and Vision Language Models (VLMs). VLMs are particularly...
1 months ago cs.CL cs.AI cs.DB
PDF
Benchmark HIGH
Priyaranjan Pattnayak, Sanchari Chowdhuri
Safety alignment of large language models (LLMs) is mostly evaluated in English and contract-bound, leaving multilingual vulnerabilities...
1 months ago cs.AI cs.CL
PDF
Benchmark HIGH
Haoyu Li, Xijia Che, Yanhao Wang +2 more
Proof-of-Vulnerability (PoV) generation is a critical task in software security, serving as a cornerstone for vulnerability validation, false...
1 months ago cs.SE cs.CR
PDF
Benchmark HIGH
André Storhaug, Jiamou Sun, Jingyue Li
Identifying vulnerability-fixing commits corresponding to disclosed CVEs is essential for secure software maintenance but remains challenging at...
1 months ago cs.SE cs.AI cs.CR
PDF
Benchmark HIGH
Adriana Alvarado Garcia, Ruyuan Wan, Ozioma C. Oguine +1 more
Recently, red teaming, with roots in security, has become a key evaluative approach to ensure the safety and reliability of Generative Artificial...
1 months ago cs.CY cs.AI cs.CL
PDF
Benchmark HIGH
Chaeyun Kim, YongTaek Lim, Kihyun Kim +2 more
Existing red-teaming benchmarks, when adapted to new languages via direct translation, fail to capture socio-technical vulnerabilities rooted in...
1 months ago cs.CY cs.AI
PDF
Benchmark HIGH
Yuhang Wang, Feiming Xu, Zheng Lin +6 more
Although large language model (LLM)-based agents, exemplified by OpenClaw, are increasingly evolving from task-oriented systems into personalized AI...
Benchmark HIGH
Nanda Rani, Kimberly Milner, Minghao Shao +9 more
Real-world offensive security operations are inherently open-ended: attackers explore unknown attack surfaces, revise hypotheses under uncertainty,...
1 months ago cs.CR cs.AI cs.MA
PDF
Benchmark HIGH
Tianyi Wu, Mingzhe Du, Yue Liu +4 more
Large language models (LLMs) are increasingly used in software development, yet their tendency to generate insecure code remains a major barrier to...
1 months ago cs.CR cs.AI cs.CL
PDF
Benchmark HIGH
Li Lu, Yanjie Zhao, Hongzhou Rao +2 more
Large Language Models (LLMs) have demonstrated remarkable proficiency in vulnerability detection. However, a critical reliability gap persists:...
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial