Suppr超能文献

基于命名函数和公共向量空间的二进制代码相似性分析

Binary code similarity analysis based on naming function and common vector space.

作者信息

Xia Bing, Pang Jianmin, Zhou Xin, Shan Zheng, Wang Junchao, Yue Feng

机构信息

State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou, China.

Zhongyuan University of Technology, Zhengzhou, China.

出版信息

Sci Rep. 2023 Sep 21;13(1):15676. doi: 10.1038/s41598-023-42769-9.

Abstract

Binary code similarity analysis is widely used in the field of vulnerability search where source code may not be available to detect whether two binary functions are similar or not. Based on deep learning and natural processing techniques, several approaches have been proposed to perform cross-platform binary code similarity analysis using control flow graphs. However, existing schemes suffer from the shortcomings of large differences in instruction syntaxes across different target platforms, inability to align control flow graph nodes, and less introduction of high-level semantics of stability, which pose challenges for identifying similar computations between binary functions of different platforms generated from the same source code. We argue that extracting stable, platform-independent semantics can improve model accuracy, and a cross-platform binary function similarity comparison model N_Match is proposed. The model elevates different platform instructions to the same semantic space to shield their underlying platform instruction differences, uses graph embedding technology to learn the stability semantics of neighbors, extracts high-level knowledge of naming function to alleviate the differences brought about by cross-platform and cross-optimization levels, and combines the stable graph structure as well as the stable, platform-independent API knowledge of naming function to represent the final semantics of functions. The experimental results show that the model accuracy of N_Match outperforms the baseline model in terms of cross-platform, cross-optimization level, and industrial scenarios. In the vulnerability search experiment, N_Match significantly improves hit@N, the mAP exceeds the current graph embedding model by 66%. In addition, we also give several interesting observations from the experiments. The code and model are publicly available at https://www.github.com/CSecurityZhongYuan/Binary-Name_Match .

摘要

二进制代码相似性分析在漏洞搜索领域被广泛应用,在无法获取源代码的情况下,用于检测两个二进制函数是否相似。基于深度学习和自然处理技术,已经提出了几种方法来使用控制流图进行跨平台二进制代码相似性分析。然而,现有方案存在以下缺点:不同目标平台的指令语法差异大、无法对齐控制流图节点以及较少引入稳定性的高级语义,这给识别由相同源代码生成的不同平台二进制函数之间的相似计算带来了挑战。我们认为提取稳定的、与平台无关的语义可以提高模型准确性,并提出了一种跨平台二进制函数相似性比较模型N_Match。该模型将不同平台的指令提升到相同的语义空间,以屏蔽其底层平台指令差异,使用图嵌入技术学习邻居的稳定性语义,提取命名函数的高级知识以减轻跨平台和跨优化级别带来的差异,并结合稳定的图结构以及命名函数的稳定的、与平台无关的API知识来表示函数的最终语义。实验结果表明,N_Match的模型准确性在跨平台、跨优化级别和工业场景方面优于基线模型。在漏洞搜索实验中,N_Match显著提高了hit@N,mAP比当前的图嵌入模型高出66%。此外,我们还从实验中给出了一些有趣的观察结果。代码和模型可在https://www.github.com/CSecurityZhongYuan/Binary-Name_Match上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c434/10514329/cb0681eaf41e/41598_2023_42769_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验