Defense MEDIUM
Krishiv Agarwal, Ramneet Kaur, Colin Samplawski +6 more
Effective safety auditing of large language models (LLMs) demands tools that go beyond black-box probing and systematically uncover vulnerabilities...
2 weeks ago cs.CR cs.LG
PDF
Benchmark MEDIUM
Hoang Nguyen, Lu Wang, Marta Gaia Bras
Freight brokerages negotiate thousands of carrier rates daily under dynamic pricing conditions where models frequently revise targets...
2 weeks ago cs.MA cs.AI cs.CL
PDF
Attack MEDIUM
Abhijit Talluri
Adversarial robustness evaluation underpins every claim of trustworthy ML deployment, yet the field suffers from fragmented protocols and undetected...
2 weeks ago cs.CR cs.LG
PDF
Other MEDIUM
Yuhang Wu, Qinyuan Liu, Qiuyang Zhao +1 more
Currently, Large Language Models (LLMs) feature a diversified architectural landscape, including traditional Transformer, GateDeltaNet, and Mamba....
2 weeks ago cs.CL cs.AI
PDF
Attack HIGH
Nandakrishna Giri, Asmitha K. A., Serena Nicolazzo +2 more
Machine learning-based static malware detectors remain vulnerable to adversarial evasion techniques, such as metamorphic engine mutations. To address...
2 weeks ago cs.CR cs.LG
PDF
Attack HIGH
Pranav Pallerla, Wilson Naik Bhukya, Bharath Vemula +1 more
Retrieval-augmented generation (RAG) systems are increasingly deployed in sensitive domains such as healthcare and law, where they rely on private,...
2 weeks ago cs.CR cs.AI
PDF
Tool LOW
Yingyong Hou, Xinyuan Lao, Huimei Wang +10 more
Background: Agent skills are increasingly deployed as modular, reusable capability units in AI agent systems. Medical research agent skills require...
Defense MEDIUM
Chao Pan, Yu Wu, Xin Yao
Internal Safety Collapse (ISC) is a failure mode in which frontier LLMs, when executing legitimate professional tasks whose correct completion...
2 weeks ago cs.CR cs.AI cs.LG
PDF
Benchmark MEDIUM
He Yang Yuan, Xin Wang, Kundi Yao +3 more
Logging code plays an important role in software systems by recording key events and behaviors, which are essential for debugging and monitoring....
3 weeks ago cs.SE cs.AI cs.CR
PDF
Defense HIGH
Ronghao Ni, Mihai Christodorescu, Limin Jia
The rapidly evolving Node$.$js ecosystem currently includes millions of packages and is a critical part of modern software supply chains, making...
3 weeks ago cs.CR cs.AI cs.SE
PDF
Survey LOW
Patrick Vossler, Jean Feng, Venkat Sivaraman +9 more
Hospital Quality Improvement (QI) plays a critical role in optimizing healthcare delivery by translating high-level hospital goals into actionable...
3 weeks ago cs.AI cs.HC
PDF
Benchmark MEDIUM
Girish, Mohd Mujtaba Akhtar, Orchid Chetia Phukan +1 more
The rapid advancement of Audio Large Language Models (ALMs), driven by Neural Audio Codecs (NACs), has led to the emergence of highly realistic...
Benchmark MEDIUM
Robert Stanley, Avi Verma, Lillian Tsai +2 more
AI agents promise to serve as general-purpose personal assistants for their users, which requires them to have access to private user data (e.g.,...
3 weeks ago cs.CR cs.AI cs.OS
PDF
Benchmark MEDIUM
Alankrit Chona, Igor Kozlov, Ambuj Kumar
We introduce the Cyber Defense Benchmark, a benchmark for measuring how well large language model (LLM) agents perform the core SOC analyst task of...
3 weeks ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Alankrit Chona, Igor Kozlov, Ambuj Kumar
We introduce the Cyber Defense Benchmark, a benchmark for measuring how well large language model (LLM) agents perform the core SOC analyst task of...
3 weeks ago cs.CR cs.AI
PDF
Defense MEDIUM
Divyesh Gabbireddy, Suman Saha
Cross-site scripting (XSS) remains a persistent web security vulnerability, especially because obfuscation can change the surface form of a malicious...
3 weeks ago cs.CR cs.LG cs.SE
PDF
Defense MEDIUM
Sarang Nambiar, Dhruv Pradhan, Ezekiel Soremekun
Pre-trained machine learning models (PTMs) are commonly provided via Model Hubs (e.g., Hugging Face) in standard formats like Pickles to facilitate...
3 weeks ago cs.CR cs.SE
PDF
Benchmark MEDIUM
Ali Al-Kaswan, Maksim Plotnikov, Maxim Hájek +3 more
Large Language Model (LLM) agents are increasingly proposed for autonomous cybersecurity tasks, but their capabilities in realistic offensive...
3 weeks ago cs.AI cs.CR cs.SE
PDF
Tool HIGH
Jiamin Chang, Minhui Xue, Ruoxi Sun +3 more
Recent advances in embodied Vision-Language Agentic Systems (VLAS), powered by large vision-language models (LVLMs), enable AI systems to perceive...
3 weeks ago cs.CV cs.AI
PDF
Other LOW
Mikako Bito, Keita Nishimoto, Kimitaka Asatani +1 more
The conformity bias exhibited by large language models (LLMs) can pose a significant challenge to decision-making in LLM-based multi-agent systems...
3 weeks ago cs.AI cs.MA cs.NE
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial