Tool MEDIUM relevance

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

Dongrui Liu Yu Li Zhonghao Yang Peng Wang Guanxu Chen Yuejin Xie Qinghua Mao Wanying Qu Yanxu Zhu Tianyi Zhou Leitao Yuan Zhijie Zheng Qihao Lin Yimin Wang Haoyu Luo Shuai Shao Chen Qian Qingyu Liu Ling Tang Ruiyang Qin Qihan Ren Junxiao Yang Kun Wang Zhiheng Xi Linfeng Zhang Ranjie Duan Bo Zhang Wenjie Wang Wen Shen Qiaosheng Zhang Yan Teng Chaochao Lu Rui Mei Man Li Jialing Tao Xi Lin Tianhang Zheng Yong Liu Quanshi Zhang Lei Zhu Xingjun Ma Junhua Liu Hui Xue Xiaoxiang Zuo Xiangnan He Chao Shen Xianglong Liu Minlie Huang Jing Shao Xia Hu

cs.AI cs.CL cs.CR cs.CV cs.LG

Published

May 28, 2026

Updated

May 28, 2026

Links

PDF arxiv

Abstract

Modern open-world agents such as OpenClaw exhibit powerful cross-environment execution capabilities yet introduce broad new safety risk sources. Meanwhile, advanced frontier AI models drastically lower attack barriers, rendering current agent alignment frameworks inadequate for real-world deployment. To tackle these emerging threats, we propose a lightweight and scalable agent safety alignment framework. Specifically, we update the agent safety taxonomy to accommodate emergent risks from Codex and OpenClaw execution scenarios. We further build a taxonomy-guided data engine with influence-function purification to train lightweight AgentDoG 1.5 variants (0.8B, 2B, 4B, and 8B parameters) using only around 1k samples, achieving comparable performance with leading closed-source models (e.g., GPT-5.4). Based on AgentDoG 1.5, we construct a highly efficient agentic safety SFT and RL training environment, which reduces deployment overhead in Docker-level environments by two orders of magnitude. Finally, we deploy AgentDoG 1.5 as a training-free online guardrail for real-time safety moderation. Extensive experimental results indicate that AgentDoG 1.5 achieves state-of-the-art performance in diverse and complex interactive agentic scenarios. All models and datasets are openly released.

Metadata

Comment: 44 pages, 12 Figures, 9 Tables

Pro Analysis

Full threat analysis, ATLAS technique mapping, compliance impact assessment (ISO 42001, EU AI Act), and actionable recommendations are available with a Pro subscription.

Threat Deep-Dive

ATLAS Mapping

Compliance Reports

Actionable Recommendations

Start 14-Day Free Trial

Back to Research