Attack MEDIUM relevance

The Infinite Mutation Engine? Measuring Polymorphism in LLM-Generated Offensive Code

Gabriel Hortea Juan Tapiador

cs.CR

Published

May 5, 2026

Updated

May 6, 2026

Links

PDF arxiv

Abstract

Malware authors have traditionally relied on polymorphic techniques to produce variants in the same malware family, complicating signature-based detection. Integrating generative AI into offensive toolchains enables attackers to synthesize structurally diverse payloads with identical behavior, raising the question of how much polymorphism LLMs provide. Recent work has assumed that LLMs can produce sufficiently polymorphic payloads, leaving unquantified the variation that emerges when an attacker repeatedly builds the same payload, or explicitly instructs the model to avoid prior implementations. In this work, we measure the polymorphic capacity of a commercial model (Claude Opus 4.6) as an automated malware generator. We build a dual-agent, four-stage pipeline that generates, tests, and refines a data-exfiltration payload comprising file traversal, encryption, exfiltration, and integration. We produce payloads in two settings: using prompts that specify only functional requirements, and using prompts that inject a structured history of prior outcomes to force divergence. We measure pairwise distances along structural (AST) and semantic (embedding) axes, finding that when polymorphism is not explicitly required, structural distances are high while semantic distances remain low; i.e., implementations diverge widely without changing high-level behavior. Explicit prompting substantially amplifies this structural diversity while preserving correctness, at the cost of roughly 5 times more tokens but only a small increase in LLM calls (from $4.2$ to $4.5$ per payload, with effective API costs of \$0.41 and \$0.73). These results show that a single commercial LLM can cheaply generate large populations of behaviorally equivalent yet structurally diverse payloads, facilitating the evasion of signature-based detection rules and similarity-based clustering.

Pro Analysis

Full threat analysis, ATLAS technique mapping, compliance impact assessment (ISO 42001, EU AI Act), and actionable recommendations are available with a Pro subscription.

Threat Deep-Dive

ATLAS Mapping

Compliance Reports

Actionable Recommendations

Start 14-Day Free Trial

Back to Research