Benchmark HIGH
Yow-Fu Liou, Yu-Chien Tang, Yu-Hsiang Liu +1 more
Benchmarking large language models (LLMs) is critical for understanding their capabilities, limitations, and robustness. In addition to interface...
Benchmark HIGH
Chutian Huang, Dake Cao, Jiacheng Ji +3 more
Background: While Large Language Models (LLMs) have achieved widespread adoption, malicious prompt engineering specifically "jailbreak attacks" poses...
Benchmark HIGH
Haoze Guo, Ziqi Wei
Retrieval-augmented generation (RAG) systems put more and more emphasis on grounding their responses in user-generated content found on the Web,...
3 months ago cs.CR cs.HC
PDF
Benchmark HIGH
Shaznin Sultana, Sadia Afreen, Nasir U. Eisty
Context: Traditional software security analysis methods struggle to keep pace with the scale and complexity of modern codebases, requiring...
Benchmark HIGH
Quy-Anh Dang, Chris Ngo, Truong-Son Hy
As large language models (LLMs) become integral to safety-critical applications, ensuring their robustness against adversarial prompts is paramount....
Benchmark HIGH
Zejian Chen, Chaozhuo Li, Chao Li +3 more
This paper provides a systematic survey of jailbreak attacks and defenses on Large Language Models (LLMs) and Vision-Language Models (VLMs),...
Benchmark HIGH
Xiangzhe Yuan, Zhenhao Zhang, Haoming Tang +1 more
As LLMs gain persuasive agentic capabilities through extended dialogues, they introduce novel risks in multi-turn conversational scams that...
Benchmark HIGH
Songyang Liu, Chaozhuo Li, Rui Pu +5 more
Jailbreak attacks present a significant challenge to the safety of Large Language Models (LLMs), yet current automated evaluation methods largely...
4 months ago cs.CR cs.CL
PDF
Benchmark HIGH
Md Hasan Saju, Maher Muhtadi, Akramul Azim
The rapid advancement of Large Language Models (LLMs) presents new opportunities for automated software vulnerability detection, a crucial task in...
4 months ago cs.SE cs.AI
PDF
Benchmark HIGH
Jingyu Zhang
Customer-service LLM agents increasingly make policy-bound decisions (refunds, rebooking, billing disputes), but the same ``helpful'' interaction...
4 months ago cs.CR cs.HC
PDF
Benchmark HIGH
Manu, Yi Guo, Kanchana Thilakarathna +5 more
Large Language Models (LLMs) can be driven into over-generation, emitting thousands of tokens before producing an end-of-sequence (EOS) token. This...
4 months ago cs.CR cs.AI cs.LG
PDF
Benchmark HIGH
Woorim Han, Yeongjun Kwak, Miseon Yu +4 more
Learning-based automated vulnerability repair (AVR) techniques that utilize fine-tuned language models have shown promise in generating vulnerability...
Benchmark HIGH
Chinmay Pushkar, Sanchit Kabra, Dhruv Kumar +1 more
Large Language Models (LLMs) have demonstrated significant potential in automated software security, particularly in vulnerability detection....
4 months ago cs.CR cs.AI
PDF
Benchmark HIGH
Zhenlei Ye, Xiaobing Sun, Sicong Cao +2 more
The advances of large language models (LLMs) have paved the way for automated software vulnerability repair approaches, which iteratively refine the...
Benchmark HIGH
Liming Lu, Xiang Gu, Junyu Huang +5 more
Large Language Models (LLMs) are increasingly used in agentic systems, where their interactions with diverse tools and environments create complex,...
Benchmark HIGH
Zhang Wei, Peilu Hu, Zhenyuan Wei +16 more
The increasing deployment of large language models (LLMs) in safety-critical applications raises fundamental challenges in systematically evaluating...
4 months ago cs.CR cs.CL
PDF
Benchmark HIGH
Safwan Shaheer, G. M. Refatul Islam, Mohammad Rafid Hamid +1 more
In this fast-evolving area of LLMs, our paper discusses the significant security risk presented by prompt injection attacks. It focuses on small...
4 months ago cs.CR cs.AI
PDF
Benchmark HIGH
Chaomeng Lu, Bert Lagaisse
Vulnerability detection methods based on deep learning (DL) have shown strong performance on benchmark datasets, yet their real-world effectiveness...
5 months ago cs.CR cs.LG cs.SE
PDF
Benchmark HIGH
Devanshu Sahoo, Vasudev Majhi, Arjun Neekhra +3 more
The use of Large Language Models (LLMs) as automatic judges for code evaluation is becoming increasingly prevalent in academic environments. But...
5 months ago cs.SE cs.AI
PDF
Benchmark HIGH
Futa Waseda, Shojiro Yamabe, Daiki Shiono +2 more
Large vision-language models (LVLMs) are vulnerable to typographic attacks, where misleading text within an image overrides visual understanding....
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial