Gray-Box Poisoning of Continuous Malware Ingestion Pipelines
Abstract
Modern malware detection pipelines rely on continuous data ingestion and machine learning to counter the high volume of novel threats. This work investigates a realistic gray-box poisoning threat model targeting these pipelines. Using the secml_malware framework, we generate problem-space adversarial binaries through functionality-preserving manipulations, specifically Import Address Table (IAT) and section injections. We evaluate the impact of these poisoned samples when ingested into a defender's training set for a LightGBM malware detection model. Our empirical results demonstrate that subtle IAT-based perturbations enable compact poisoning samples that significantly degrade detection recall. These findings illustrate the inherent challenge of developing low-visibility adversarial perturbations that maintain high poisoning efficacy within continuous learning systems. We further evaluate a defense mechanism based on a homogeneous ensemble, which successfully identifies and filters up to 95.6% of poisoning attempts while maintaining a high retention rate for legitimate data. These findings emphasize the necessity of robust pre-ingestion validation in production pipelines.
Pro Analysis
Full threat analysis, ATLAS technique mapping, compliance impact assessment (ISO 42001, EU AI Act), and actionable recommendations are available with a Pro subscription.
AI Threat Alert