Benchmark MEDIUM
Adam Kaufman, James Lucassen, Tyler Tracy +2 more
Future AI agents might run autonomously with elevated privileges. If these agents are misaligned, they might abuse these privileges to cause serious...
3 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Xuanjun Zong, Zhiqi Shen, Lei Wang +2 more
Large language models (LLMs) are evolving into agentic systems that reason, plan, and operate external tools. The Model Context Protocol (MCP) is a...
3 months ago cs.CL cs.AI
PDF
Benchmark LOW
Edward Y. Chang
Large Language Models exhibit sycophancy: prioritizing agreeableness over correctness. Current remedies evaluate reasoning outcomes: RLHF rewards...
3 months ago cs.CL cs.AI
PDF
Benchmark LOW
Sahibpreet Singh, Shikha Dhiman
The integration of generative Artificial Intelligence into the digital ecosystem necessitates a critical re-evaluation of Indian criminal...
3 months ago cs.CR cs.AI cs.CY
PDF
Benchmark MEDIUM
Yihan Liao, Jacky Keung, Xiaoxue Ma +2 more
The rapid advancement of Large Language Models (LLMs) has been driven by extensive datasets that may contain sensitive information, raising serious...
Benchmark MEDIUM
Ruozhao Yang, Mingfei Cheng, Gelei Deng +3 more
Penetration testing is essential for assessing and strengthening system security against real-world threats, yet traditional workflows remain highly...
3 months ago cs.SE cs.AI cs.CR
PDF
Benchmark MEDIUM
Akhil Sharma, Shaikh Yaser Arafat, Jai Kumar Sharma +1 more
The increasing operational reliance on complex Multi-Agent Systems (MAS) across safety-critical domains necessitates rigorous adversarial robustness...
Benchmark MEDIUM
Ali Al Sahili, Ali Chehab, Razane Tajeddine
Large Language Models (LLMs) are prone to memorizing training data, which poses serious privacy risks. Two of the most prominent concerns are...
3 months ago cs.LG cs.CL cs.CR
PDF
Benchmark MEDIUM
Md Nahid Hasan Shuvo, Moinul Hossain
Connected autonomous vehicles (CAVs) rely on vision-based deep neural networks (DNNs) and low-latency (Vehicle-to-Everything) V2X communication to...
3 months ago cs.CV cs.AI cs.CR
PDF
Benchmark MEDIUM
Sanjay Das, Swastik Bhattacharya, Shamik Kundu +3 more
State-space models (SSMs), exemplified by the Mamba architecture, have recently emerged as state-of-the-art sequence-modeling frameworks, offering...
3 months ago cs.CR cs.LG
PDF
Benchmark MEDIUM
Luoxi Meng, Henry Feng, Ilia Shumailov +1 more
Browser-using agents (BUAs) are an emerging class of AI agents that interact with web browsers in human-like ways, including clicking, scrolling,...
3 months ago cs.CR cs.LG
PDF
Benchmark MEDIUM
Arastoo Zibaeirad, Marco Vieira
Large Language Models (LLMs) are increasingly being studied for Software Vulnerability Detection (SVD) and Repair (SVR). Individual LLMs have...
3 months ago cs.SE cs.AI
PDF
Benchmark MEDIUM
Xin Yang, Omid Ardakanian
Data obfuscation is a promising technique for mitigating attribute inference attacks by semi-trusted parties with access to time-series data emitted...
3 months ago cs.LG cs.CR
PDF
Benchmark MEDIUM
Edward Lue Chee Lip, Anthony Channg, Diana Kim +2 more
As AI capabilities advance, we increasingly rely on powerful models to decompose complex tasks $\unicode{x2013}$ but what if the decomposer itself is...
3 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Han Yang, Shaofeng Li, Tian Dong +3 more
Deep Neural Networks (DNNs), as valuable intellectual property, face unauthorized use. Existing protections, such as digital watermarking, are...
3 months ago cs.CR cs.LG
PDF
Benchmark HIGH
Chaomeng Lu, Bert Lagaisse
Vulnerability detection methods based on deep learning (DL) have shown strong performance on benchmark datasets, yet their real-world effectiveness...
3 months ago cs.CR cs.LG cs.SE
PDF
Benchmark MEDIUM
N Mangala, Murtaza Rangwala, S Aishwarya +5 more
Healthcare has become exceptionally sophisticated, as wearables and connected medical devices are revolutionising remote patient monitoring,...
3 months ago cs.CR cs.DC
PDF
Benchmark HIGH
Devanshu Sahoo, Vasudev Majhi, Arjun Neekhra +3 more
The use of Large Language Models (LLMs) as automatic judges for code evaluation is becoming increasingly prevalent in academic environments. But...
3 months ago cs.SE cs.AI
PDF
Benchmark LOW
Yash Srivastava, Shalin Jain, Sneha Awathare +1 more
The rising demand for collaborative machine learning and data analytics calls for secure and decentralized data sharing frameworks that balance...
3 months ago cs.CR cs.AI cs.DC
PDF
Benchmark MEDIUM
Jan Betley, Jorio Cocola, Dylan Feng +4 more
LLMs are useful because they generalize so well. But can you have too much of a good thing? We show that a small amount of finetuning in narrow...
3 months ago cs.CL cs.AI cs.CR
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial