Benchmark LOW
Vahideh Zolfaghari
Background Large language models (LLMs) are increasingly deployed in medical consultations, yet their safety under realistic user pressures remains...
2 months ago cs.CL cs.AI
PDF
Benchmark LOW
Marc S. Montalvo, Hamed Yaghoobian
Recent advances in large language models (LLMs) are transforming data-intensive domains, with finance representing a high-stakes environment where...
3 months ago cs.MA cs.AI
PDF
Benchmark HIGH
Chinmay Pushkar, Sanchit Kabra, Dhruv Kumar +1 more
Large Language Models (LLMs) have demonstrated significant potential in automated software security, particularly in vulnerability detection....
3 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Yifan Huang, Xiaojun Jia, Wenbo Guo +4 more
Large language models (LLMs) have revolutionized software development through AI-assisted coding tools, enabling developers with limited programming...
3 months ago cs.CR cs.AI cs.SE
PDF
Benchmark MEDIUM
Jiashuo Liu, Jiayun Wu, Chunjie Wu +5 more
The rapid proliferation of Large Language Models (LLMs) and diverse specialized benchmarks necessitates a shift from fragmented, task-specific...
3 months ago cs.LG cs.AI cs.PF
PDF
Benchmark LOW
Miles Q. Li, Benjamin C. M. Fung, Martin Weiss +3 more
As autonomous AI agents are increasingly deployed in high-stakes environments, ensuring their safety and alignment with human values has become a...
Benchmark MEDIUM
Adam Elaoumari
The purpose of this project is to assess how well defenders can detect DNS-over-HTTPS (DoH) file exfiltration, and which evasion strategies can be...
3 months ago cs.CR cs.AI cs.NI
PDF
Benchmark HIGH
Zhenlei Ye, Xiaobing Sun, Sicong Cao +2 more
The advances of large language models (LLMs) have paved the way for automated software vulnerability repair approaches, which iteratively refine the...
Benchmark MEDIUM
Aaron Chan, Alex Ding, Frank Chen +3 more
The rapid integration of Large Language Models (LLMs) into decentralized physical infrastructure networks (DePIN) is currently bottlenecked by the...
Benchmark MEDIUM
Naseem Machlovi, Maryam Saleki, Ruhul Amin +5 more
As large language models (LLMs) become deeply embedded in daily life, the urgent need for safer moderation systems, distinguishing between naive from...
3 months ago cs.CL cs.AI cs.HC
PDF
Benchmark MEDIUM
Naseem Machlovi, Maryam Saleki, Ruhul Amin +5 more
As large language models (LLMs) become deeply embedded in daily life, the urgent need for safer moderation systems that distinguish between naive and...
3 months ago cs.CL cs.AI cs.HC
PDF
Benchmark HIGH
Liming Lu, Xiang Gu, Junyu Huang +5 more
Large Language Models (LLMs) are increasingly used in agentic systems, where their interactions with diverse tools and environments create complex,...
Benchmark HIGH
Zhang Wei, Peilu Hu, Zhenyuan Wei +16 more
The increasing deployment of large language models (LLMs) in safety-critical applications raises fundamental challenges in systematically evaluating...
3 months ago cs.CR cs.CL
PDF
Benchmark MEDIUM
Sumanth Bharadwaj Hachalli Karanam, Dhiwahar Adhithya Kennady
Manual software beta testing is costly and time-consuming, while single-agent large language model (LLM) approaches suffer from hallucinations and...
3 months ago cs.SE cs.AI cs.MA
PDF
Benchmark MEDIUM
Scott Thornton
AI coding assistants produce vulnerable code in 45\% of security-relevant scenarios~\cite{veracode2025}, yet no public training dataset teaches both...
3 months ago cs.CR cs.AI cs.CL
PDF
Benchmark MEDIUM
Wei Qian, Chenxu Zhao, Yangyi Li +1 more
The rapid advancements in artificial intelligence (AI) have primarily focused on the process of learning from data to acquire knowledgeable learning...
3 months ago cs.LG cs.CR
PDF
Benchmark MEDIUM
Wang Bin, Ao Yang, Kedan Li +5 more
In the domain of software security testing, Directed Grey-Box Fuzzing (DGF) has garnered widespread attention for its efficient target localization...
3 months ago cs.SE cs.AI
PDF
Benchmark MEDIUM
Baolei Zhang, Minghong Fang, Zhuqing Liu +5 more
Federated Learning (FL) allows multiple clients to collaboratively train a model without sharing their private data. However, FL is vulnerable to...
3 months ago cs.CR cs.DC cs.LG
PDF
Benchmark HIGH
Safwan Shaheer, G. M. Refatul Islam, Mohammad Rafid Hamid +1 more
In this fast-evolving area of LLMs, our paper discusses the significant security risk presented by prompt injection attacks. It focuses on small...
3 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Saksham Sahai Srivastava, Haoyu He
Large Language Model (LLM) agents increasingly rely on long-term memory and Retrieval-Augmented Generation (RAG) to persist experiences and refine...
3 months ago cs.CR cs.AI cs.LG
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial