LM-Fix: Lightweight Bit-Flip Detection and Rapid Recovery Framework for Language Models
Abstract
This paper presents LM-Fix, a lightweight detection and rapid recovery framework for faults in large language models (LLMs). Existing integrity approaches are often heavy or slow for modern LLMs. LM-Fix runs a short test-vector pass and uses hash-guided checks to detect bit-flip faults, then repairs them locally without a full reload. Across multiple models, it detects over 94% of single-bit flips at TVL=200 and nearly 100% of multi-bit flips with approximately 1% to 7.7% runtime overhead; recovery is more than 100x faster than reloading. These results show a practical, low-overhead solution to keep LLMs reliable in production
Metadata
- Journal
- Proc. IEEE Int. Conf. on Computer Design (ICCD), 2025, pp. 432-440
- Comment
- Accepted at IEEE ICCD 2025. Code: https://github.com/ata990/lm-fix. Detects over 94 percent single-bit flips (near 100 percent multi-bit) with about 1 to 7.7 percent overhead; recovery is over 100x faster than a full reload. Keywords: LLMs, bit-flip, fault injection, reliability, security, Rowhammer, SDC, Jailbreaking, Attack, Defense, GPU DRAM faults
Pro Analysis
Full threat analysis, ATLAS technique mapping, compliance impact assessment (ISO 42001, EU AI Act), and actionable recommendations are available with a Pro subscription.