Benchmark MEDIUM
Rui Yin, Tianxu Han, Naen Xu +8 more
Safety-aligned large language models (LLMs) are increasingly deployed in real-world pipelines, yet this deployment also enlarges the supply-chain...
4 weeks ago cs.CR cs.CL
PDF
Benchmark MEDIUM
Pei-Yu Tseng, Lan Zhang, ZihDwo Yeh +3 more
Cyber Threat Intelligence (CTI) reports contain Indicators of Compromise (IOCs) that are critical for security operations. To operationalize these...
Benchmark MEDIUM
Ricardo Bessa, Rui Claro, João Trindade +1 more
Large Language Models (LLMs) are redefining offensive cybersecurity by allowing the generation of harmful machine code with minimal human...
Benchmark LOW
Javad M Alizadeh, Genhui Zheng, Chiu C Tan +7 more
People experiencing homelessness (PEH) face substantial barriers to accessing timely, accurate information about community services. DreamKG...
Benchmark MEDIUM
Hanbo Huang, Xuan Gong, Yiran Zhang +2 more
Large language model (LLM) watermarking has emerged as a promising approach for detecting and attributing AI-generated text, yet its robustness to...
Benchmark LOW
Jinhua Wang, Biswa Sengupta
Cross-language migration of large software systems is a persistent engineering challenge, particularly when the source codebase evolves rapidly. We...
4 weeks ago cs.SE cs.AI
PDF
Benchmark MEDIUM
Ricardo Bessa, Rui Claro, João Trindade +1 more
The application of Machine Learning techniques in code generation is now a common practice for most developers. Tools such as ChatGPT from OpenAI...
Benchmark LOW
Dzenan Hamzic, Florian Skopik, Max Landauer +2 more
Cyber threat intelligence (CTI) analysts must answer complex questions over large collections of narrative security reports. Retrieval-augmented...
4 weeks ago cs.AI cs.CR
PDF
Benchmark MEDIUM
Xiaomeng Hu, Yinger Zhang, Fei Huang +7 more
AI agents are expected to perform professional work across hundreds of occupational domains (from emergency department triage to nuclear reactor...
Benchmark MEDIUM
Yuchen Chen, Yuan Xiao, Chunrong Fang +2 more
The proliferation of large language models for code (CodeLMs) and open-source contributions has heightened concerns over unauthorized use of source...
Benchmark LOW
Wenbo Hu, Xin Chen, Yan Gao-Tian +3 more
Group Relative Policy Optimization (GRPO) has emerged as the de facto Reinforcement Learning (RL) objective driving recent advancements in Multimodal...
1 months ago cs.CV cs.AI cs.CL
PDF
Benchmark HIGH
Runpeng Geng, Chenlong Yin, Yanting Wang +2 more
Prompt injection attacks pose serious security risks across a wide range of real-world applications. While receiving increasing attention, the...
1 months ago cs.CR cs.AI cs.CL
PDF
Benchmark MEDIUM
Wenhao Yuan, Chenchen Lin, Jian Chen +3 more
In large language model (LLM) agents, reasoning trajectories are treated as reliable internal beliefs for guiding actions and updating memory....
1 months ago cs.AI cs.CL
PDF
Benchmark MEDIUM
Aryaman Arora, Zhengxuan Wu, Jacob Steinhardt +1 more
In language model interpretability research, \textbf{circuit tracing} aims to identify which internal features causally contributed to a particular...
Benchmark MEDIUM
Yu Liang, Liangxin Liu, Longzheng Wang +5 more
Generative reward models (GRMs) have emerged as a promising approach for aligning Large Language Models (LLMs) with human preferences by offering...
1 months ago cs.AI cs.CL cs.LG
PDF
Benchmark LOW
Hanyi Liu, Zhonghao Jiu, Minghao Wang +2 more
Implicit artistic influence, although visually plausible, is often undocumented and thus poses a historically constrained attribution problem:...
Benchmark MEDIUM
Yuanhang Li
Operating LEO mega-constellations requires translating high-level operator intents ("reroute financial traffic away from polar links under 80 ms")...
1 months ago cs.CR cs.AI
PDF
Benchmark HIGH
Phan The Duy, Nguyen Viet Duy, Khoa Ngo-Khanh +2 more
While recent approaches leverage large language models (LLMs) and multi-agent pipelines to automatically generate proof-of-concept (PoC) exploits...
Benchmark HIGH
Baoshun Tong, Haoran He, Ling Pan +2 more
Vision-Language-Action (VLA) models have achieved remarkable success in robotic manipulation. However, their robustness to linguistic nuances remains...
1 months ago cs.RO cs.CV
PDF
Benchmark MEDIUM
Geert Trooskens, Aaron Karlsberg, Anmol Sharma +6 more
We study compiled AI, a paradigm in which large language models generate executable code artifacts during a compilation phase, after which workflows...
1 months ago cs.SE cs.AI
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial