Catak Ferhat Ozgur, Yazı Ahmet Faruk, Elezaj Ogerta, Ahmed Javed
Department of Information Security and Communication Technology, NTNU Norwegian University of Science and Technology, Gjøvik, Norway.
TUBITAK Bilgem Cyber Security Institute, Kocaeli, Turkey.
PeerJ Comput Sci. 2020 Jul 27;6:e285. doi: 10.7717/peerj-cs.285. eCollection 2020.
Malware development has seen diversity in terms of architecture and features. This advancement in the competencies of malware poses a severe threat and opens new research dimensions in malware detection. This study is focused on metamorphic malware, which is the most advanced member of the malware family. It is quite impossible for anti-virus applications using traditional signature-based methods to detect metamorphic malware, which makes it difficult to classify this type of malware accordingly. Recent research literature about malware detection and classification discusses this issue related to malware behavior. The main goal of this paper is to develop a classification method according to malware types by taking into consideration the behavior of malware. We started this research by developing a new dataset containing API calls made on the windows operating system, which represents the behavior of malicious software. The types of malicious malware included in the dataset are Adware, Backdoor, Downloader, Dropper, spyware, Trojan, Virus, and Worm. The classification method used in this study is LSTM (Long Short-Term Memory), which is a widely used classification method in sequential data. The results obtained by the classifier demonstrate accuracy up to 95% with 0.83 $F_1$-score, which is quite satisfactory. We also run our experiments with binary and multi-class malware datasets to show the classification performance of the LSTM model. Another significant contribution of this research paper is the development of a new dataset for Windows operating systems based on API calls. To the best of our knowledge, there is no such dataset available before our research. The availability of our dataset on GitHub facilitates the research community in the domain of malware detection to benefit and make a further contribution to this domain.
恶意软件的开发在架构和功能方面呈现出多样性。恶意软件能力的这种进步构成了严重威胁,并为恶意软件检测开辟了新的研究维度。本研究聚焦于变形恶意软件,它是恶意软件家族中最先进的成员。使用传统基于签名的方法的杀毒应用程序完全不可能检测到变形恶意软件,这使得对这类恶意软件进行分类变得困难。最近关于恶意软件检测和分类的研究文献讨论了与恶意软件行为相关的这个问题。本文的主要目标是通过考虑恶意软件的行为来开发一种根据恶意软件类型进行分类的方法。我们通过开发一个包含在Windows操作系统上进行的API调用的新数据集来启动这项研究,该数据集代表了恶意软件的行为。数据集中包含的恶意软件类型有广告软件、后门程序、下载器、投放器、间谍软件、木马、病毒和蠕虫。本研究中使用的分类方法是长短期记忆网络(LSTM),它是序列数据中广泛使用的分类方法。分类器获得的结果显示准确率高达95%,F1分数为0.83,这相当令人满意。我们还使用二进制和多类恶意软件数据集运行了实验,以展示LSTM模型的分类性能。这篇研究论文的另一个重要贡献是基于API调用为Windows操作系统开发了一个新数据集。据我们所知,在我们的研究之前没有这样的数据集。我们的数据集在GitHub上的可用性便于恶意软件检测领域的研究社区从中受益,并为该领域做出进一步贡献。