School of Computer Science and Technology, Xinjiang University, Xinjiang Uygur Autonomous Region, Urumqi, People's Republic of China.
PLoS One. 2024 Jun 27;19(6):e0304066. doi: 10.1371/journal.pone.0304066. eCollection 2024.
In recent years, with the development of the Internet, the attribution classification of APT malware remains an important issue in society. Existing methods have yet to consider the DLL link library and hidden file address during the execution process, and there are shortcomings in capturing the local and global correlation of event behaviors. Compared to the structural features of binary code, opcode features reflect the runtime instructions and do not consider the issue of multiple reuse of local operation behaviors within the same APT organization. Obfuscation techniques more easily influence attribution classification based on single features. To address the above issues, (1) an event behavior graph based on API instructions and related operations is constructed to capture the execution traces on the host using the GNNs model. (2) ImageCNTM captures the local spatial correlation and continuous long-term dependency of opcode images. (3) The word frequency and behavior features are concatenated and fused, proposing a multi-feature, multi-input deep learning model. We collected a publicly available dataset of APT malware to evaluate our method. The attribution classification results of the model based on a single feature reached 89.24% and 91.91%. Finally, compared to single-feature classifiers, the multi-feature fusion model achieves better classification performance.
近年来,随着互联网的发展,APT 恶意软件的归因分类仍然是社会上的一个重要问题。现有方法在执行过程中尚未考虑 DLL 链接库和隐藏文件地址,并且在捕获事件行为的本地和全局相关性方面存在缺陷。与二进制代码的结构特征相比,操作码特征反映了运行时指令,并且不考虑同一 APT 组织内本地操作行为的多次重用问题。基于单一特征的混淆技术更容易影响归因分类。针对上述问题,(1)构建了基于 API 指令和相关操作的事件行为图,使用 GNNs 模型捕获主机上的执行轨迹。(2)ImageCNTM 捕获操作码图像的局部空间相关性和连续长期依赖关系。(3)将单词频率和行为特征进行串联和融合,提出了一种多特征、多输入深度学习模型。我们收集了一个公开的 APT 恶意软件数据集来评估我们的方法。基于单一特征的模型归因分类结果达到了 89.24%和 91.91%。最后,与单特征分类器相比,多特征融合模型实现了更好的分类性能。