Benchmark MEDIUM
Sai Puppala, Ismail Hossain, Md Jahangir Alam +5 more
Large language models are increasingly deployed as *deep agents* that plan, maintain persistent state, and invoke external tools, shifting safety...
1 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Kunal Pai, Parth Shah, Harshil Patel
AI agents are increasingly deployed in production, yet their security evaluations remain bottlenecked by manual red-teaming or static benchmarks that...
1 months ago cs.AI cs.MA
PDF
Benchmark MEDIUM
Xiang Li, Pin-Yu Chen, Wenqi Wei
With the rapid advancement and adoption of Audio Large Language Models (ALLMs), voice agents are now being deployed in high-stakes domains such as...
1 months ago cs.CR cs.MA
PDF
Benchmark MEDIUM
Qi Sun, Ahmed Abdo, Luis Burbano +4 more
Autonomous Vehicles (AVs), especially vision-based AVs, are rapidly being deployed without human operators. As AVs operate in safety-critical...
1 months ago cs.CR cs.LG
PDF
Benchmark MEDIUM
Haoyang Hu, Zhejun Jiang, Yueming Lyu +3 more
Retrieval-augmented generation (RAG) is increasingly deployed in real-world applications, where its reference-grounded design makes outputs appear...
1 months ago cs.CR cs.LG
PDF
Benchmark MEDIUM
Yi Liu, Zhihao Chen, Yanjun Zhang +5 more
Third-party agent skills extend LLM-based agents with instruction files and executable code that run on users' machines. Skills execute with user...
1 months ago cs.CR cs.AI cs.CL
PDF
Benchmark MEDIUM
Navita Goyal, Hal Daumé
Model steering, which involves intervening on hidden representations at inference time, has emerged as a lightweight alternative to finetuning for...
1 months ago cs.LG cs.AI cs.CL
PDF
Benchmark MEDIUM
José Ramón Pareja Monturiol, Juliette Sinnott, Roger G. Melko +1 more
Machine learning in clinical settings must balance predictive accuracy, interpretability, and privacy. Models such as logistic regression (LR) offer...
1 months ago cs.LG cs.CR quant-ph
PDF
Benchmark MEDIUM
Ruixin Yang, Ethan Mendes, Arthur Wang +4 more
Vision-language models (VLMs) have demonstrated strong performance in image geolocation, a capability further sharpened by frontier multimodal large...
1 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Casey Ford, Madison Van Doren, Emily Dix
Multimodal large language models (MLLMs) are increasingly deployed in real-world systems, yet their safety under adversarial prompting remains...
1 months ago cs.CL cs.AI cs.HC
PDF
Benchmark MEDIUM
Debargha Ganguly, Sreehari Sankar, Biyao Zhang +8 more
Current approaches to LLM safety fundamentally rely on a brittle cat-and-mouse game of identifying and blocking known threats via guardrails. We...
1 months ago cs.CL cs.AI cs.DC
PDF
Benchmark MEDIUM
Omar Abdelnasser, Fatemah Alharbi, Khaled Khasawneh +2 more
Safety alignment in Language Models (LMs) is fundamental for trustworthy AI. However, while different stakeholders are trying to leverage Arabic...
1 months ago cs.CL cs.AI
PDF
Benchmark MEDIUM
Tomer Kordonsky, Maayan Yamin, Noam Benzimra +2 more
LLMs are increasingly used for code generation, but their outputs often follow recurring templates that can induce predictable vulnerabilities. We...
1 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Najmul Hasan, Prashanth BusiReddyGari
The Uniform Resource Locator (URL), introduced in a connectivity-first era to define access and locate resources, remains historically limited,...
1 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Rodrigo Tertulino, Ricardo Almeida, Laercio Alencar
The digitization of healthcare has generated massive volumes of Electronic Health Records (EHRs), offering unprecedented opportunities for training...
1 months ago cs.CR cs.AI cs.LG
PDF
Benchmark MEDIUM
Yen-Shan Chen, Zhi Rui Tam, Cheng-Kuang Wu +1 more
Current evaluations of LLM safety predominantly rely on severity-based taxonomies to assess the harmfulness of malicious queries. We argue that this...
1 months ago cs.CR cs.CL cs.CY
PDF
Benchmark MEDIUM
Max Manolov, Tony Gao, Siddharth Shukla +2 more
Large language models (LLMs) are increasingly used to assist developers with code, yet their implementations of cryptographic functionality often...
1 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Abhilekh Borah, Shubhra Ghosh, Kedar Joshi +2 more
Tasks such as solving arithmetic equations, evaluating truth tables, and completing syllogisms are handled well by large language models (LLMs) in...
Benchmark MEDIUM
Evgeny Grigorenko, David Stanojević, David Ilić +2 more
Modern Integrated Development Environments (IDEs) increasingly leverage Large Language Models (LLMs) to provide advanced features like code...
1 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Farnaz Soltaniani, Shoaib Razzaq, Mohammad Ghafari
Early detection of security bug reports (SBRs) is critical for timely vulnerability mitigation. We present an evaluation of prompt-based engineering...
1 months ago cs.CR cs.AI cs.LG
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial