Department of Security Technology, China Mobile Research Institute, Beijing, 100053, China.
Data & AI Technology Company, China Telecom Corporation Ltd, Beijing, 100011, China.
Sci Rep. 2023 Mar 12;13(1):4096. doi: 10.1038/s41598-023-31280-w.
Binary code similarity detection (BCSD) plays a big role in the process of binary application security test. It can be applied in several fields, such as software plagiarism detection, malware analysis, vulnerability detection. Most research is based on recurrent neural networks, which is difficult to get the overall or long-distance semantic information of functions. Besides, exiting works simply extract high-level semantic features, lacking in-depth investigations on the potential mechanisms for fusing low-level and high-level semantic features. In this paper we propose a multi-semantic feature fusion attention network (MFFA-Net) for BCSD. MFFA-Net contains two critical modules: semantic feature fusion (SFF) and attention feature fusion (AFF). The SFF module concatenates multiple semantic features to represent the semantics of the function, which helps to obtain the overall semantic information of the function. The AFF module is designed to find useful information from various features, which assigns an attention matrix to research the relationship between features. In order to evaluate the proposed method, we made extensive experiments on two datasets. MFFA-Net can achieve a high degree of AUC at 99.6% and 98.3% respectively on the two datasets. The experimental results show that MFFA-Net has better performance for BCSD.
二进制代码相似性检测(BCSD)在二进制应用程序安全测试过程中起着重要作用。它可以应用于多个领域,如软件抄袭检测、恶意软件分析、漏洞检测。大多数研究基于循环神经网络,这使得很难获取函数的整体或远距离语义信息。此外,现有工作只是提取高级语义特征,缺乏对融合低水平和高水平语义特征的潜在机制的深入研究。在本文中,我们提出了一种用于 BCSD 的多语义特征融合注意力网络(MFFA-Net)。MFFA-Net 包含两个关键模块:语义特征融合(SFF)和注意力特征融合(AFF)。SFF 模块将多个语义特征连接起来表示函数的语义,有助于获取函数的整体语义信息。AFF 模块旨在从各种特征中找到有用的信息,为研究特征之间的关系分配一个注意力矩阵。为了评估所提出的方法,我们在两个数据集上进行了广泛的实验。MFFA-Net 在两个数据集上的 AUC 分别达到了 99.6%和 98.3%的高度。实验结果表明,MFFA-Net 在 BCSD 方面具有更好的性能。