Group Relative Policy Optimization (GRPO) has emerged as the de facto Reinforcement Learning (RL) objective driving recent advancements in Multimodal...
Vision-Language-Action (VLA) models have achieved remarkable success in robotic manipulation. However, their robustness to linguistic nuances remains...
Geert Trooskens, Aaron Karlsberg, Anmol Sharma +6 more
We study compiled AI, a paradigm in which large language models generate executable code artifacts during a compilation phase, after which workflows...
Inference-time compute scaling has emerged as a powerful paradigm for improving language model performance on a wide range of tasks, but the question...
Abinitha Gourabathina, Inkit Padhi, Manish Nagireddy +2 more
For Large Language Models (LLMs) to be reliably deployed, models must effectively know when not to answer: abstain. Reasoning models, in particular,...
Retrieval-Augmented Language Models (RALMs) have demonstrated significant potential in knowledge-intensive tasks; however, they remain vulnerable to...
Matteo Migliarini, Joaquin Pereira Pizzini, Luca Moresca +3 more
Instrumental convergence predicts that sufficiently advanced AI agents will resist shutdown, yet current safety training (RLHF) may obscure this risk...