Benchmark MEDIUM
Zhiyang Dai, Yansong Gao, Boyu Kuang +5 more
Contrastive learning (CL) reduces annotation cost via auto-derived supervisory signals. Since large-scale in-house CL datasets are infeasible,...
1 weeks ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Huining Cui, Wei Liu
Retrieval-augmented generation (RAG) improves factual grounding by conditioning large language models on retrieved evidence, but it also opens a...
1 weeks ago cs.CR cs.DB
PDF
Benchmark MEDIUM
Zehui Tang, Yuchen Liu, Feihu Huang
Federated learning (FL) is a popular distributed learning paradigm in machine learning, which enables multiple clients to collaboratively train...
1 weeks ago cs.LG cs.AI cs.CR
PDF
Benchmark MEDIUM
Zhijun Li, Minghui Xu, Huayi Qi +6 more
Retrieval-Augmented Generation (RAG) is essential for enhancing Large Language Models (LLMs) with external knowledge, but its reliance on cloud...
Benchmark MEDIUM
Zhijun Li, Minghui Xu, Huayi Qi +6 more
Retrieval-Augmented Generation (RAG) is essential for enhancing Large Language Models (LLMs) with external knowledge, but its reliance on cloud...
Benchmark MEDIUM
Kemal Bicakci
Public agencies are beginning to consider large language models (LLMs) as decision-support tools for grant evaluation. This creates a practical...
2 weeks ago cs.CR cs.AI cs.CY
PDF
Benchmark MEDIUM
Runze Cui, Fangxin Shang, Yehui Yang +2 more
Document understanding is a critical capability in financial credit review, onboarding, and remote verification, where both decision accuracy and...
2 weeks ago cs.CV cs.CE cs.MM
PDF
Benchmark MEDIUM
Yuanfan Li, Qi Zhou, Chengzhengxu Li +5 more
We present MGTEVAL, an extensible platform for systematic evaluation of Machine-Generated Text (MGT) detectors. Despite rapid progress in MGT...
2 weeks ago cs.CR cs.CL
PDF
Benchmark MEDIUM
Aaron J. Li, Nicolas Sanchez, Hao Huang +8 more
Large language models (LLMs) are increasingly deployed, yet their outputs can be highly sensitive to routine, non-adversarial variation in how users...
2 weeks ago cs.CL cs.AI
PDF
Benchmark MEDIUM
Qi Li, Jiu Li, Pingtao Wei +8 more
This report presents a comparative evaluation of DKnownAI Guard in AI agent security scenarios, benchmarked against three competing products: AWS...
2 weeks ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Pablo Mateo-Torrejón, Alfonso Sánchez-Macián
The rapid integration of Large Language Models (LLMs) into Multi-Agent Systems (MAS) has significantly enhanced their collaborative problem-solving...
2 weeks ago cs.CR cs.AI cs.MA
PDF
Benchmark MEDIUM
Zijun Feng, Yuming Feng, Yu Wang +4 more
Cross-chain bridges, the critical infrastructure of the multi-chain ecosystem, have become a primary target for attackers, resulting in over $2.8...
Benchmark MEDIUM
Víctor Mayoral-Vilches, María Sanz-Gómez, Francesco Balassone +6 more
As LLM-driven agents advance in cybersecurity, Jeopardy CTF benchmarks are approaching saturation and cyber ranges, the natural next evaluation...
Benchmark MEDIUM
Eungyu Woo, Yooshin Kim, Wonje Heo +1 more
Industrial Control Systems (ICS) integrate computing, physical processes, and communication to operate critical infrastructures such as power grids,...
Benchmark MEDIUM
Qi Li, Bo Yin, Weiqi Huang +6 more
Vision-Language-Action (VLA) models are emerging as a unified substrate for embodied intelligence. This shift raises a new class of safety...
Benchmark MEDIUM
Yuchen Shi, Xin Guo, Huajie Chen +3 more
Poisoning-based backdoor attacks pose significant threats to deep neural networks by embedding triggers in training data, causing models to...
2 weeks ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Vishal Rajput
We prove that empirical risk minimisation (ERM) imposes a necessary geometric constraint on learned representations: any encoder that minimises...
2 weeks ago cs.LG cs.AI cs.CV
PDF
Benchmark MEDIUM
Ari Azarafrooz
AI-agent guardrails are memoryless: each message is judged in isolation, so an adversary who spreads a single attack across dozens of sessions slips...
2 weeks ago cs.CR cs.AI cs.CL
PDF
Benchmark MEDIUM
Mohammad Farhad, Shuvalaxmi Dass
Software security relies on effective vulnerability detection and patching, yet determining whether a patch fully eliminates risk remains an...
2 weeks ago cs.SE cs.CR
PDF
Benchmark MEDIUM
Hoang Nguyen, Lu Wang, Marta Gaia Bras
Freight brokerages negotiate thousands of carrier rates daily under dynamic pricing conditions where models frequently revise targets...
2 weeks ago cs.MA cs.AI cs.CL
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial