Benchmark MEDIUM
Pedro Conde, Henrique Branquinho, Valerio Mazzone +3 more
AI pentesting agents are increasingly credible as offensive security systems, but current benchmarks still provide limited guidance on which will...
Yesterday cs.AI cs.CR
PDF
Benchmark MEDIUM
Saba Pourhanifeh, AbdulAziz AbdulGhaffar, Ashraf Matrawy
Large Language Models(LLMs) are increasingly explored for cybersecurity applications such as vulnerability detection. In the domain of threat...
Yesterday cs.CR cs.AI
PDF
Benchmark MEDIUM
Qinghua Mao, Xi Lin, Jinze Gu +3 more
Large language models (LLMs) increasingly rely on knowledge editing to support knowledge-intensive reasoning, but this flexibility also introduces...
Yesterday cs.AI cs.CR
PDF
Benchmark MEDIUM
Xia Hu, Zhenrui Yue, Brian Potetz +4 more
As current Multimodal Large Language Models rapidly saturate canonical visual reasoning benchmarks, a key question emerges: do these strong scores...
Yesterday cs.CV cs.AI
PDF
Benchmark MEDIUM
Huy Hoang Ha, Benoit Favre, Francois Portet
Large language models (LLMs) have saturated standard medical benchmarks that test factual recall, yet their ability to perform higher-order...
2 days ago cs.CL cs.AI
PDF
Benchmark MEDIUM
Jingshen Zhang, Bo Wang, Yanlin Fu +4 more
In this paper, we study an emergent self-debiasing mechanisms against stereotypical content in Large Language Models (LLMs). Unlike traditional...
Benchmark MEDIUM
Yilin Zhang, Yingkai Hua, Chunyu Wei +2 more
Vision-language model (VLM) based web agents demonstrate impressive autonomous GUI interaction but remain vulnerable to deceptive interface elements....
2 days ago cs.AI cs.CR
PDF
Benchmark MEDIUM
Di Lu, Bo Zhang, Xiyuan Li +5 more
Self-hosted computer-use agents (SHCUAs), such as OpenClaw, combine natural-language interaction with direct access to host-side resources, including...
Benchmark MEDIUM
Qinfeng Li, Yuntai Bao, Jianghui Hu +5 more
LLM agents rely on prompts to implement task-specific capabilities based on foundation LLMs, making agent prompts valuable intellectual property....
5 days ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Christopher G. Pedraza Pohlenz, Hassan Jalil Hadi, Ali Hassan +1 more
LLMs are increasingly explored for malware analysis; however, current LLM-based malware attribution remains limited by unsupported indicators and...
5 days ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Xiaomin Li, Andrzej Banburski-Fahey, Jaron Lanier
Auditing language-model outputs often requires more than judging correctness: an auditor may need to identify which source document most likely...
Benchmark MEDIUM
Dasol Choi, Eugenia Kim, Jaewon Noh +14 more
Current LLM safety benchmarks are predominantly English-centric and often rely on translation, failing to capture country-specific harms. Moreover,...
5 days ago cs.CL cs.AI
PDF
Benchmark MEDIUM
Chenglin Yang
Modern AI agents execute real-world side effects through tool calls such as file operations, shell commands, HTTP requests, and database queries. A...
6 days ago cs.AI cs.CR
PDF
Benchmark MEDIUM
Rishi Raj Sahoo, Jyotirmaya Shivottam, Subhankar Mishra
Regulatory frameworks such as GDPR increasingly require that ML predictions be accompanied by post-hoc explanations, even when raw data and trained...
1 weeks ago cs.LG cs.CR
PDF
Benchmark MEDIUM
Bikrant Bikram Pratap Maurya, Nitin Choudhury, Daksh Agarwal +1 more
Acoustic side-channel attacks (ASCA) on keyboards pose a significant security risk, as keystrokes can be inferred from typing acoustics, revealing...
1 weeks ago cs.CR cs.SD
PDF
Benchmark MEDIUM
Zuoyu Zhang, Yancheng Zhu
Tool-using agent systems powered by large language models (LLMs) are increasingly deployed across web, app, operating-system, and transactional...
Benchmark MEDIUM
Yuhui Wang, Tanqiu Jiang, Jiacheng Liang +2 more
As large language model (LLM)-powered agents are increasingly deployed to perform complex, real-world tasks, they face a growing class of attacks...
1 weeks ago cs.CR cs.AI cs.CL
PDF
Benchmark MEDIUM
Judith Sáinz-Pardo Díaz, Álvaro López García
The growing development of artificial intelligence based solutions, together with privacy legislation, has driven the rise of the so-called privacy...
1 weeks ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Karima Makhlouf, Lamiaa Basyoni, Syed Khaderi +4 more
Large language models (LLMs) are increasingly deployed in interactive and retrieval-augmented settings, raising significant privacy concerns. While...
1 weeks ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Debeshee Das, Julien Piet, Darya Kaviani +3 more
Memory systems enable otherwise-stateless LLM agents to persist user information across sessions, but also introduce a new attack surface. We...
1 weeks ago cs.CR cs.AI
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial