Benchmark LOW
Rahul Baxi
AI agents are increasingly granted economic agency (executing trades, managing budgets, negotiating contracts, and spawning sub-agents), yet current...
Benchmark LOW
Yashas Hariprasad, Subhash Gurappa, Sundararaj S. Iyengar +3 more
The Forensics Investigations Network in Digital Sciences (FINDS) Research Center of Excellence (CoE), funded by the U.S. Army Research Laboratory,...
2 months ago cs.CR cs.AI
PDF
Benchmark HIGH
Zhicheng Fang, Jingjie Zheng, Chenxu Fu +1 more
Jailbreak techniques for large language models (LLMs) evolve faster than benchmarks, making robustness estimates stale and difficult to compare...
2 months ago cs.CR cs.AI cs.CL
PDF
Benchmark HIGH
Xuhui Dou, Hayretdin Bahsi, Alejandro Guerra-Manzanares
Recent work applies Large Language Models (LLMs) to source-code vulnerability detection, but most evaluations still rely on random train-test splits...
2 months ago cs.CR cs.AI cs.LG
PDF
Benchmark MEDIUM
Chung-ju Huang, Huiqiang Zhao, Yuanpeng He +5 more
The increasing reliance on cloud-hosted Large Language Models (LLMs) exposes sensitive client data, such as prompts and responses, to potential...
2 months ago cs.CR cs.AI cs.CL
PDF
Benchmark MEDIUM
David Condrey
The proliferation of AI-generated text has intensified the need for reliable authorship verification, yet current output-based methods are...
2 months ago cs.CR cs.HC cs.LG
PDF
Benchmark LOW
Zhengqing Yuan, Kaiwen Shi, Zheyuan Zhang +3 more
Scientific research relies on accurate citation for attribution and integrity, yet large language models (LLMs) introduce a new risk: fabricated...
2 months ago cs.CL cs.DL
PDF
Benchmark LOW
Yuan Liang, Ruobin Zhong, Haoming Xu +46 more
Current AI agents can flexibly invoke tools and execute complex tasks, yet their long-term advancement is hindered by the lack of systematic...
2 months ago cs.AI cs.CL cs.CV
PDF
Benchmark LOW
Jiazheng Quan, Xiaodong Li, Bin Wang +5 more
Large language models (LLMs) have demonstrated strong capabilities in code generation, yet they remain prone to producing security vulnerabilities....
2 months ago cs.CR cs.AI cs.SE
PDF
Benchmark LOW
Balazs Pejo
Federated learning offers a privacy-friendly collaborative learning framework, yet its success, like any joint venture, hinges on the contributions...
2 months ago cs.LG cs.CR
PDF
Benchmark LOW
Vladimer Khasia
The pursuit of world model based artificial intelligence has predominantly relied on projecting high-dimensional observations into parameterized...
Benchmark MEDIUM
Nazanin Mohammadi Sepahvand, Eleni Triantafillou, Hugo Larochelle +3 more
Large language models (LLMs) trained on webscale data can produce toxic outputs, raising concerns for safe deployment. Prior defenses, based on...
Benchmark LOW
Mohammed Cherifi
Public EV charging infrastructure suffers from significant failure rates -- with field studies reporting up to 27.5% of DC fast chargers...
2 months ago cs.DC cs.AI cs.LG
PDF
Benchmark MEDIUM
Guangnian Wan, Qi Li, Gongfan Fang +2 more
Multimodal Diffusion Language Models (MDLMs) have recently emerged as a competitive alternative to their autoregressive counterparts. Yet their...
2 months ago cs.CR cs.LG
PDF
Benchmark MEDIUM
Longxiang Wang, Xiang Zheng, Xuhao Zhang +3 more
Multi-tenant LLM serving frameworks widely adopt shared Key-Value caches to enhance efficiency. However, this creates side-channel vulnerabilities...
2 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Lei Ba, Qinbin Li, Songze Li
LLM-based code interpreter agents are increasingly deployed in critical workflows, yet their robustness against risks introduced by their code...
Benchmark MEDIUM
Jingwei Shi, Xinxiang Yin, Jing Huang +2 more
The evaluation of Large Language Models (LLMs) for code generation relies heavily on the quality and robustness of test cases. However, existing...
2 months ago cs.SE cs.AI cs.CR
PDF
Benchmark MEDIUM
Abdullah Caglar Oksuz, Anisa Halimi, Erman Ayday
Membership inference attacks (MIAs) threaten the privacy of machine learning models by revealing whether a specific data point was used during...
2 months ago cs.LG cs.CR
PDF
Benchmark LOW
Martin Bertran, Riccardo Fogliato, Zhiwei Steven Wu
Empirical conclusions depend not only on data but on analytic decisions made throughout the research process. Many-analyst studies have quantified...
2 months ago cs.AI cs.LG
PDF
Benchmark HIGH
Mirae Kim, Seonghun Jeong, Youngjun Kwak
Jailbreaking poses a significant risk to the deployment of Large Language Models (LLMs) and Vision Language Models (VLMs). VLMs are particularly...
2 months ago cs.CL cs.AI cs.DB
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial