• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MSSA:用于二进制代码相似性检测的多阶段语义感知神经网络。

MSSA: multi-stage semantic-aware neural network for binary code similarity detection.

作者信息

Wan Bangrui, Zhou Jianjun, Wang Ying, Chen Feng, Qian Ying

机构信息

School of Software Engineering, Chongqing University of Posts and Telecommunications, Chongqing, China.

Chongqing Engineering Research Center of Software Quality Assurance, Testing and Assessment, Chongqing, China.

出版信息

PeerJ Comput Sci. 2025 Jan 17;11:e2504. doi: 10.7717/peerj-cs.2504. eCollection 2025.

DOI:10.7717/peerj-cs.2504
PMID:39896042
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11784775/
Abstract

Binary code similarity detection (BCSD) aims to identify whether a pair of binary code snippets is similar, which is widely used for tasks such as malware analysis, patch analysis, and clone detection. Current state-of-the-art approaches are based on Transformer, which require substantial computation resources. Learning-based approaches remains room for optimization in learning the deeper semantics of binary code. In this paper, we propose MSSA, a multi-stage semantic-aware neural network for BCSD at the function level. It effectively integrates the semantic and structural information of assembly instructions within and between basic blocks, and across the entire function through four semantic-aware neural networks, achieving deep understanding of binary code semantics. MSSA is a lightweight model with only 0.38M parameters in its backbone network, suitable for deployment in CPU environments. Experimental results show that MSSA outperforms Gemini, Asm2Vec, SAFE, and jTrans in classification performance and ranks second only to the Transformer-based jTrans in retrieval performance.

摘要

二进制代码相似度检测(BCSD)旨在识别一对二进制代码片段是否相似,它广泛应用于恶意软件分析、补丁分析和克隆检测等任务。当前的先进方法基于Transformer,这需要大量的计算资源。基于学习的方法在学习二进制代码的深层语义方面仍有优化空间。在本文中,我们提出了MSSA,一种用于函数级BCSD的多阶段语义感知神经网络。它通过四个语义感知神经网络有效地整合了基本块内和基本块之间以及整个函数中的汇编指令的语义和结构信息,实现了对二进制代码语义的深度理解。MSSA是一个轻量级模型,其骨干网络中只有0.38M个参数,适合在CPU环境中部署。实验结果表明,MSSA在分类性能上优于Gemini、Asm2Vec、SAFE和jTrans,在检索性能上仅次于基于Transformer的jTrans,排名第二。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e8a/11784775/ebb19e3483f9/peerj-cs-11-2504-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e8a/11784775/700e113e3b87/peerj-cs-11-2504-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e8a/11784775/ff0e5e3fe2e8/peerj-cs-11-2504-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e8a/11784775/ebb19e3483f9/peerj-cs-11-2504-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e8a/11784775/700e113e3b87/peerj-cs-11-2504-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e8a/11784775/ff0e5e3fe2e8/peerj-cs-11-2504-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e8a/11784775/ebb19e3483f9/peerj-cs-11-2504-g003.jpg

相似文献

1
MSSA: multi-stage semantic-aware neural network for binary code similarity detection.MSSA:用于二进制代码相似性检测的多阶段语义感知神经网络。
PeerJ Comput Sci. 2025 Jan 17;11:e2504. doi: 10.7717/peerj-cs.2504. eCollection 2025.
2
Ex2Vec: Enhancing assembly code semantics with end-to-end execution-aware embeddings.Ex2Vec:通过端到端执行感知嵌入增强汇编代码语义。
Neural Netw. 2025 Sep;189:107506. doi: 10.1016/j.neunet.2025.107506. Epub 2025 May 1.
3
Multi-semantic feature fusion attention network for binary code similarity detection.多语义特征融合注意力网络用于二进制代码相似度检测。
Sci Rep. 2023 Mar 12;13(1):4096. doi: 10.1038/s41598-023-31280-w.
4
CRaDLe: Deep code retrieval based on semantic Dependency Learning.CRaDLe:基于语义依存学习的深度代码检索
Neural Netw. 2021 Sep;141:385-394. doi: 10.1016/j.neunet.2021.04.019. Epub 2021 Apr 26.
5
IoTSim: Internet of Things-Oriented Binary Code Similarity Detection with Multiple Block Relations.IoTSim:面向物联网的具有多块关系的二进制代码相似性检测
Sensors (Basel). 2023 Sep 11;23(18):7789. doi: 10.3390/s23187789.
6
Semantic aware-based instruction embedding for binary code similarity detection.基于语义感知的指令嵌入的二进制代码相似性检测。
PLoS One. 2024 Jun 11;19(6):e0305299. doi: 10.1371/journal.pone.0305299. eCollection 2024.
7
Cross-platform binary code similarity detection based on NMT and graph embedding.基于神经机器翻译和图嵌入的跨平台二进制代码相似度检测
Math Biosci Eng. 2021 May 25;18(4):4528-4551. doi: 10.3934/mbe.2021230.
8
Hierarchical Recurrent Neural Hashing for Image Retrieval With Hierarchical Convolutional Features.基于层次卷积特征的层次递归神经网络哈希图像检索
IEEE Trans Image Process. 2018;27(1):106-120. doi: 10.1109/TIP.2017.2755766.
9
PBDiff: Neural network based program-wide diffing method for binaries.PBDiff:一种基于神经网络的二进制文件全程序差异检测方法。
Math Biosci Eng. 2022 Jan 13;19(3):2774-2799. doi: 10.3934/mbe.2022127.
10
Syntactic-Semantic Detection of Clone-Caused Vulnerabilities in the IoT Devices.物联网设备中克隆导致的漏洞的句法-语义检测
Sensors (Basel). 2024 Nov 13;24(22):7251. doi: 10.3390/s24227251.

本文引用的文献

1
IoTSim: Internet of Things-Oriented Binary Code Similarity Detection with Multiple Block Relations.IoTSim:面向物联网的具有多块关系的二进制代码相似性检测
Sensors (Basel). 2023 Sep 11;23(18):7789. doi: 10.3390/s23187789.
2
Reducing the dimensionality of data with neural networks.使用神经网络降低数据维度。
Science. 2006 Jul 28;313(5786):504-7. doi: 10.1126/science.1127647.