Defense MEDIUM relevance

MalCVE: Malware Detection and CVE Association Using Large Language Models

Eduard Andrei Cristea Petter Molnes Jingyue Li

cs.CR cs.SE

Published

October 17, 2025

Updated

February 2, 2026

Links

PDF arxiv

Abstract

Malicious software attacks are having an increasingly significant economic impact. Commercial malware detection software can be costly, and tools that attribute malware to the specific software vulnerabilities it exploits are largely lacking. Understanding the connection between malware and the vulnerabilities it targets is crucial for analyzing past threats and proactively defending against current ones. In this study, we propose an approach that leverages large language models (LLMs) to detect binary malware, specifically within JAR files, and uses LLM capabilities combined with retrieval-augmented generation (RAG) to identify Common Vulnerabilities and Exposures (CVEs) that malware may exploit. We developed a proof-of-concept tool, MalCVE, that integrates binary code decompilation, deobfuscation, LLM-based code summarization, semantic similarity search, and LLM-based CVE classification. We evaluated MalCVE using a benchmark dataset of 3,839 JAR executables. MalCVE achieved a mean malware-detection accuracy of 97%, at a fraction of the cost of commercial solutions. In particular, the results demonstrate that LLM-based code summarization enables highly accurate and explainable malware identification. MalCVE is also the first tool to associate CVEs with binary malware, achieving a recall@10 of 65%, which is comparable to studies that perform similar analyses on source code.

Metadata

DOI: 10.1145/3793655.3793735

Pro Analysis

Full threat analysis, ATLAS technique mapping, compliance impact assessment (ISO 42001, EU AI Act), and actionable recommendations are available with a Pro subscription.

Threat Deep-Dive

ATLAS Mapping

Compliance Reports

Actionable Recommendations

Start 14-Day Free Trial

Back to Research