Suppr超能文献

基于语义感知的指令嵌入的二进制代码相似性检测。

Semantic aware-based instruction embedding for binary code similarity detection.

机构信息

College of Information Engineering, Zhejiang University of Technology, Hangzhou, Zhejiang, China.

出版信息

PLoS One. 2024 Jun 11;19(6):e0305299. doi: 10.1371/journal.pone.0305299. eCollection 2024.

Abstract

Binary code similarity detection plays a crucial role in various applications within binary security, including vulnerability detection, malicious software analysis, etc. However, existing methods suffer from limited differentiation in binary embedding representations across different compilation environments, lacking dynamic high-level semantics. Moreover, current approaches often neglect multi-level semantic feature extraction, thereby failing to acquire precise semantic information about the binary code. To address these limitations, this paper introduces a novel detection solution called BinBcla. This method employs an enhanced pre-training model to generate instruction embeddings with dynamic semantics for binary functions. Subsequently, multi-feature fusion technique is utilized to extract local semantic information and long-distance global features from the code, respectively, employing self-attention to comprehend the structure information of the code. Finally, an improved cosine similarity method is employed to learn relationships among all elements of the distance vectors, thereby enhancing the model's robustness to new sample functions. Experiments are conducted across different architectures, compilers, and optimization levels. The results indicate that BinBcla achieves higher accuracy, precision and F1 score compared to existing methods.

摘要

二进制代码相似性检测在二进制安全的各种应用中起着至关重要的作用,包括漏洞检测、恶意软件分析等。然而,现有的方法在不同编译环境下的二进制嵌入表示中存在区分度有限的问题,缺乏动态的高级语义。此外,当前的方法往往忽略了多层次的语义特征提取,因此无法获取关于二进制代码的精确语义信息。为了解决这些限制,本文引入了一种名为 BinBcla 的新检测解决方案。该方法采用增强的预训练模型生成具有动态语义的二进制函数指令嵌入。随后,采用多特征融合技术分别从代码中提取局部语义信息和长距离全局特征,利用自注意力机制理解代码的结构信息。最后,采用改进的余弦相似度方法来学习距离向量中所有元素之间的关系,从而提高模型对新样本函数的鲁棒性。在不同的架构、编译器和优化级别上进行了实验。结果表明,与现有方法相比,BinBcla 具有更高的准确性、精度和 F1 分数。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dbf5/11166306/49cd8c5a79ab/pone.0305299.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验