Attack MEDIUM relevance

CALYREX: Cross-Attention LaYeR EXtended Transformers for System Prompt Anchoring

Li Lixing

cs.LG

Published

May 10, 2026

Updated

May 10, 2026

Links

PDF arxiv

Abstract

Modern large language models (LLMs) rely on system prompts to establish behavioral constraints and safety rules. Standard causal self-attention treats privileged instructions and untrusted user content with equal structural priority -- a mismatch that leaves models vulnerable to prompt injection and instruction erosion over extended contexts. We propose CALYREX (Cross-Attention LaYeR EXtended transformers), which utilizes cross-attention between input and system prompt to structurally isolate and anchor the rule. A placement ablation on a 1.5B backbone identifies insertion at the final eighth of layers as optimal, confirmed by mechanistic activation analysis showing behavioral constraints are naturally concentrated there. At 8B scale, controlling for training data, backbone, and parameter budget, CALYREX yields $+7.4\%$ on instruction-following (IFEval) and $+16.3\%$ on multi-turn instruction adherence, while reducing many-shot jailbreaking attack success rate by $13\%$. This advantage appears to widen with model scale, consistent with larger models more effectively utilizing the dedicated routing pathway.

Metadata

Comment: Preprint. 25 pages, 4 figures, 9 tables

Pro Analysis

Full threat analysis, ATLAS technique mapping, compliance impact assessment (ISO 42001, EU AI Act), and actionable recommendations are available with a Pro subscription.

Threat Deep-Dive

ATLAS Mapping

Compliance Reports

Actionable Recommendations

Start 14-Day Free Trial

Back to Research