Benchmark MEDIUM
Muhammad Bilal, Omer Tariq, Hasan Ahmed
Timing and burst patterns can leak through encryption, and an adaptive adversary can exploit them. This undermines metadata-only detection in a...
4 months ago cs.CR cs.LG cs.NI
PDF
Benchmark LOW
Sixue Xing, Xuanye Xia, Kerui Wu +3 more
Clinical trial failure remains a central bottleneck in drug development, where minor protocol design flaws can irreversibly compromise outcomes...
4 months ago cs.AI cs.MA
PDF
Benchmark HIGH
Md Hasan Saju, Maher Muhtadi, Akramul Azim
The rapid advancement of Large Language Models (LLMs) presents new opportunities for automated software vulnerability detection, a crucial task in...
4 months ago cs.SE cs.AI
PDF
Benchmark MEDIUM
Yiming Liang, Yizhi Li, Yantao Du +14 more
Benchmarks play a crucial role in tracking the rapid advancement of large language models (LLMs) and identifying their capability boundaries....
4 months ago cs.CL cs.AI
PDF
Benchmark MEDIUM
Bohan Liang, Zijian Chen, Qi Jia +3 more
Stock prediction, a subject closely related to people's investment activities in fully dynamic and live environments, has been widely studied....
4 months ago q-fin.ST cs.LG
PDF
Benchmark MEDIUM
Muhammad Abdullahi Said, Muhammad Sammani Sani
As Large Language Models (LLMs) integrate into critical global infrastructure, the assumption that safety alignment transfers zero-shot from English...
4 months ago cs.CL cs.AI cs.CY
PDF
Benchmark HIGH
Jingyu Zhang
Customer-service LLM agents increasingly make policy-bound decisions (refunds, rebooking, billing disputes), but the same ``helpful'' interaction...
4 months ago cs.CR cs.HC
PDF
Benchmark MEDIUM
Zhe Huang, Hao Wen, Aiming Hao +6 more
Multimodal Large Language Models (MLLMs) have made remarkable progress in video understanding. However, they suffer from a critical vulnerability: an...
4 months ago cs.CV cs.AI
PDF
Benchmark MEDIUM
Heba Osama, Omar Elebiary, Youssef Qassim +4 more
Web applications increasingly face evasive and polymorphic attack payloads, yet traditional web application firewalls (WAFs) based on static rule...
Benchmark HIGH
Manu, Yi Guo, Kanchana Thilakarathna +5 more
Large Language Models (LLMs) can be driven into over-generation, emitting thousands of tokens before producing an end-of-sequence (EOS) token. This...
4 months ago cs.CR cs.AI cs.LG
PDF
Benchmark MEDIUM
Karolina Korgul, Yushi Yang, Arkadiusz Drohomirecki +7 more
Web-based agents powered by large language models are increasingly used for tasks such as email management or professional networking. Their reliance...
4 months ago cs.HC cs.AI cs.MA
PDF
Benchmark LOW
Kerem Zaman, Shashank Srivastava
Recent work, using the Biasing Features metric, labels a CoT as unfaithful if it omits a prompt-injected hint that affected the prediction. We argue...
4 months ago cs.CL cs.AI cs.LG
PDF
Benchmark HIGH
Woorim Han, Yeongjun Kwak, Miseon Yu +4 more
Learning-based automated vulnerability repair (AVR) techniques that utilize fine-tuned language models have shown promise in generating vulnerability...
Benchmark LOW
Vahideh Zolfaghari
Background Large language models (LLMs) are increasingly deployed in medical consultations, yet their safety under realistic user pressures remains...
4 months ago cs.CL cs.AI
PDF
Benchmark LOW
Marc S. Montalvo, Hamed Yaghoobian
Recent advances in large language models (LLMs) are transforming data-intensive domains, with finance representing a high-stakes environment where...
4 months ago cs.MA cs.AI
PDF
Benchmark HIGH
Chinmay Pushkar, Sanchit Kabra, Dhruv Kumar +1 more
Large Language Models (LLMs) have demonstrated significant potential in automated software security, particularly in vulnerability detection....
4 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Yifan Huang, Xiaojun Jia, Wenbo Guo +4 more
Large language models (LLMs) have revolutionized software development through AI-assisted coding tools, enabling developers with limited programming...
4 months ago cs.CR cs.AI cs.SE
PDF
Benchmark MEDIUM
Jiashuo Liu, Jiayun Wu, Chunjie Wu +5 more
The rapid proliferation of Large Language Models (LLMs) and diverse specialized benchmarks necessitates a shift from fragmented, task-specific...
4 months ago cs.LG cs.AI cs.PF
PDF
Benchmark LOW
Miles Q. Li, Benjamin C. M. Fung, Martin Weiss +3 more
As autonomous AI agents are increasingly deployed in high-stakes environments, ensuring their safety and alignment with human values has become a...
Benchmark MEDIUM
Adam Elaoumari
The purpose of this project is to assess how well defenders can detect DNS-over-HTTPS (DoH) file exfiltration, and which evasion strategies can be...
4 months ago cs.CR cs.AI cs.NI
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial