College of Computer Science and Technology, Harbin Engineering University, Harbin, Heilongjiang, China.
PLoS One. 2019 Aug 26;14(8):e0211373. doi: 10.1371/journal.pone.0211373. eCollection 2019.
With the exponential increase in malware, homology analysis has become a hot research topic in the malware detection field. This paper proposes MHAS, a malware homology analysis system based on ensemble learning and multifeatures. MHAS generates grayscale images from malware binary files and then uses the opcode tool IDA Pro to extract opcode sequences and system call graphs. Thus, RGB images and M-images are generated on the image matrix. Then, MHAS uses convolutional neural networks (CNNs) as base learners to perform bagging ensemble learning to learn features from the grayscale images, RGB images and M-images. Next, MHAS integrates the nine base learners using voting, learning and selective ensemble (in that order) and maps the integration results to the result matrix. Finally, the result matrix is again integrated using the learning method to obtain the final malware classification result. To verify the accuracy of MHAS, we performed a malware family classification experiment, that included samples of 10 malware families. The results showed that MHAS can reach an accuracy rate of 99.17%, meaning that it can effectively analyze and identify malware families.
随着恶意软件的指数级增长,同源分析已成为恶意软件检测领域的热门研究课题。本文提出了一种基于集成学习和多特征的恶意软件同源分析系统 MHAS。MHAS 从恶意软件二进制文件生成灰度图像,然后使用 opcode 工具 IDA Pro 提取 opcode 序列和系统调用图。这样,在图像矩阵上生成 RGB 图像和 M 图像。然后,MHAS 使用卷积神经网络 (CNN) 作为基础学习者进行袋装集成学习,从灰度图像、RGB 图像和 M 图像中学习特征。接下来,MHAS 使用投票、学习和选择性集成(依次)集成九个基础学习者,并将集成结果映射到结果矩阵。最后,使用学习方法再次集成结果矩阵,以获得最终的恶意软件分类结果。为了验证 MHAS 的准确性,我们进行了恶意软件家族分类实验,其中包括 10 个恶意软件家族的样本。结果表明,MHAS 可以达到 99.17%的准确率,这意味着它可以有效地分析和识别恶意软件家族。