Benchmark MEDIUM
Mohamed Shaaban, Mohamed Elmahallawy
Federated learning (FL) enables collaborative training across organizational silos without sharing raw data, making it attractive for...
1 months ago cs.CR cs.CL
PDF
Benchmark MEDIUM
Anudeep Das, Prach Chantasantitam, Gurjot Singh +3 more
Large language models (LLMs) are increasingly deployed in settings where inducing a bias toward a certain topic can have significant consequences,...
1 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Xu Li, Simon Yu, Minzhou Pan +5 more
LLM-based agents are becoming increasingly capable, yet their safety lags behind. This creates a gap between what agents can do and should do. This...
1 months ago cs.CR cs.AI cs.CL
PDF
Benchmark MEDIUM
Tailia Malloy, Tegawende F. Bissyande
Large Language Models are expanding beyond being a tool humans use and into independent agents that can observe an environment, reason about...
1 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Nataša Krčo, Zexi Yao, Matthieu Meeus +1 more
Data containing personal information is increasingly used to train, fine-tune, or query Large Language Models (LLMs). Text is typically scrubbed of...
1 months ago cs.CL cs.AI cs.CR
PDF
Benchmark LOW
Rosie Zhao, Anshul Shah, Xiaoyu Zhu +5 more
Reinforcement learning (RL) fine-tuning has become a key technique for enhancing large language models (LLMs) on reasoning-intensive tasks,...
Benchmark HIGH
André Storhaug, Jiamou Sun, Jingyue Li
Identifying vulnerability-fixing commits corresponding to disclosed CVEs is essential for secure software maintenance but remains challenging at...
1 months ago cs.SE cs.AI cs.CR
PDF
Benchmark MEDIUM
Faouzi El Yagoubi, Ranwa Al Mallah, Godwin Badu-Marfo
Multi-agent Large Language Model (LLM) systems create privacy risks that current benchmarks cannot measure. When agents coordinate on tasks,...
Benchmark MEDIUM
Aashish Kolluri, Rishi Sharma, Manuel Costa +5 more
Indirect prompt injection attacks threaten AI agents that execute consequential actions, motivating deterministic system-level defenses. Such...
1 months ago cs.CR cs.LG
PDF
Benchmark LOW
Yang Liu, Armstrong Foundjem, Xingfang Wu +2 more
Context: In the fast-paced evolution of software development, Large Language Models (LLMs) have become indispensable tools for tasks such as code...
Benchmark MEDIUM
Arpit Singh Gautam, Kailash Talreja, Saurabh Jha
Large Language Models (LLMs) frequently hallucinate plausible but incorrect assertions, a vulnerability often missed by uncertainty metrics when...
1 months ago cs.CL cs.AI
PDF
Benchmark MEDIUM
Zhenhua Zou, Sheng Guo, Qiuyang Zhan +6 more
The evolution of Large Language Models (LLMs) has shifted mobile computing from App-centric interactions to system-level autonomous agents. Current...
1 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Xinguo Feng, Zhongkui Ma, Zihan Wang +2 more
Training and fine-tuning large-scale language models largely benefit from collaborative learning, but the approach has been proven vulnerable to...
Benchmark MEDIUM
Matteo Migliarini, Berat Ercevik, Oluwagbemike Olowe +5 more
Large Language Models (LLMs) are increasingly deployed as active participants on public social media platforms, yet their behavior in these...
1 months ago cs.SI cs.CY
PDF
Benchmark MEDIUM
Yuxin Cao, Wei Song, Shangzhi Xu +2 more
Video Large Language Models (VideoLLMs) have recently achieved strong performance in video understanding tasks. However, we identify a previously...
1 months ago cs.CV cs.CR cs.MM
PDF
Benchmark MEDIUM
Mohan Rajagopalan, Vinay Rao
Large Language Model (LLM) applications are vulnerable to prompt injection and context manipulation attacks that traditional security models cannot...
1 months ago cs.CR cs.AI cs.MA
PDF
Benchmark LOW
Yilin Yang, Zhenghui Guo, Yuke Wang +3 more
Large Vision-Language Models (VLMs) have achieved remarkable success across diverse multimodal tasks but remain vulnerable to hallucinations rooted...
Benchmark HIGH
Adriana Alvarado Garcia, Ruyuan Wan, Ozioma C. Oguine +1 more
Recently, red teaming, with roots in security, has become a key evaluative approach to ensure the safety and reliability of Generative Artificial...
1 months ago cs.CY cs.AI cs.CL
PDF
Benchmark LOW
Pei-Chi Pan, Yingbin Liang, Sen Lin
Large Language Models (LLMs) demonstrate transformative potential, yet their reasoning remains inconsistent and unreliable. Reinforcement learning...
Benchmark HIGH
Chaeyun Kim, YongTaek Lim, Kihyun Kim +2 more
Existing red-teaming benchmarks, when adapted to new languages via direct translation, fail to capture socio-technical vulnerabilities rooted in...
1 months ago cs.CY cs.AI
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial