Benchmark HIGH
Songyang Liu, Chaozhuo Li, Rui Pu +5 more
Jailbreak attacks present a significant challenge to the safety of Large Language Models (LLMs), yet current automated evaluation methods largely...
4 months ago cs.CR cs.CL
PDF
Benchmark MEDIUM
Muntasir Adnan, Carlos C. N. Kuhn
Large Language Models have become integral to software development, yet they frequently generate vulnerable code. Existing code vulnerability...
4 months ago cs.SE cs.AI
PDF
Benchmark MEDIUM
Zhuoran Tan, Run Hao, Jeremy Singer +2 more
Tool-augmented LLM agents raise new security risks: tool executions can introduce runtime-only behaviors, including prompt injection and unintended...
4 months ago cs.CR cs.SE
PDF
Benchmark MEDIUM
Milad Rahmati, Nima Rahmati
The proliferation of Internet of Things devices in critical infrastructure has created unprecedented cybersecurity challenges, necessitating...
4 months ago cs.CR cs.LG
PDF
Benchmark MEDIUM
Muhammad Bilal, Omer Tariq, Hasan Ahmed
Timing and burst patterns can leak through encryption, and an adaptive adversary can exploit them. This undermines metadata-only detection in a...
4 months ago cs.CR cs.LG cs.NI
PDF
Benchmark LOW
Sixue Xing, Xuanye Xia, Kerui Wu +3 more
Clinical trial failure remains a central bottleneck in drug development, where minor protocol design flaws can irreversibly compromise outcomes...
4 months ago cs.AI cs.MA
PDF
Benchmark HIGH
Md Hasan Saju, Maher Muhtadi, Akramul Azim
The rapid advancement of Large Language Models (LLMs) presents new opportunities for automated software vulnerability detection, a crucial task in...
4 months ago cs.SE cs.AI
PDF
Benchmark MEDIUM
Yiming Liang, Yizhi Li, Yantao Du +14 more
Benchmarks play a crucial role in tracking the rapid advancement of large language models (LLMs) and identifying their capability boundaries....
4 months ago cs.CL cs.AI
PDF
Benchmark MEDIUM
Bohan Liang, Zijian Chen, Qi Jia +3 more
Stock prediction, a subject closely related to people's investment activities in fully dynamic and live environments, has been widely studied....
4 months ago q-fin.ST cs.LG
PDF
Benchmark MEDIUM
Muhammad Abdullahi Said, Muhammad Sammani Sani
As Large Language Models (LLMs) integrate into critical global infrastructure, the assumption that safety alignment transfers zero-shot from English...
4 months ago cs.CL cs.AI cs.CY
PDF
Benchmark HIGH
Jingyu Zhang
Customer-service LLM agents increasingly make policy-bound decisions (refunds, rebooking, billing disputes), but the same ``helpful'' interaction...
4 months ago cs.CR cs.HC
PDF
Benchmark MEDIUM
Zhe Huang, Hao Wen, Aiming Hao +6 more
Multimodal Large Language Models (MLLMs) have made remarkable progress in video understanding. However, they suffer from a critical vulnerability: an...
4 months ago cs.CV cs.AI
PDF
Benchmark MEDIUM
Heba Osama, Omar Elebiary, Youssef Qassim +4 more
Web applications increasingly face evasive and polymorphic attack payloads, yet traditional web application firewalls (WAFs) based on static rule...
Benchmark HIGH
Manu, Yi Guo, Kanchana Thilakarathna +5 more
Large Language Models (LLMs) can be driven into over-generation, emitting thousands of tokens before producing an end-of-sequence (EOS) token. This...
4 months ago cs.CR cs.AI cs.LG
PDF
Benchmark MEDIUM
Karolina Korgul, Yushi Yang, Arkadiusz Drohomirecki +7 more
Web-based agents powered by large language models are increasingly used for tasks such as email management or professional networking. Their reliance...
4 months ago cs.HC cs.AI cs.MA
PDF
Benchmark LOW
Kerem Zaman, Shashank Srivastava
Recent work, using the Biasing Features metric, labels a CoT as unfaithful if it omits a prompt-injected hint that affected the prediction. We argue...
4 months ago cs.CL cs.AI cs.LG
PDF
Benchmark HIGH
Woorim Han, Yeongjun Kwak, Miseon Yu +4 more
Learning-based automated vulnerability repair (AVR) techniques that utilize fine-tuned language models have shown promise in generating vulnerability...
Benchmark LOW
Vahideh Zolfaghari
Background Large language models (LLMs) are increasingly deployed in medical consultations, yet their safety under realistic user pressures remains...
4 months ago cs.CL cs.AI
PDF
Benchmark LOW
Marc S. Montalvo, Hamed Yaghoobian
Recent advances in large language models (LLMs) are transforming data-intensive domains, with finance representing a high-stakes environment where...
4 months ago cs.MA cs.AI
PDF
Benchmark HIGH
Chinmay Pushkar, Sanchit Kabra, Dhruv Kumar +1 more
Large Language Models (LLMs) have demonstrated significant potential in automated software security, particularly in vulnerability detection....
4 months ago cs.CR cs.AI
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial