Tang Xueyan, Du Yuying, Lai Alan, Zhang Ze, Shi Lingzhi
Salus Security, Beijing, 100020, China.
Sci Rep. 2023 Nov 16;13(1):20106. doi: 10.1038/s41598-023-47219-0.
This paper aims to explore the application of deep learning in smart contract vulnerabilities detection. Smart contracts are an essential part of blockchain technology and are crucial for developing decentralized applications. However, smart contract vulnerabilities can cause financial losses and system crashes. Static analysis tools are frequently used to detect vulnerabilities in smart contracts, but they often result in false positives and false negatives because of their high reliance on predefined rules and lack of semantic analysis capabilities. Furthermore, these predefined rules quickly become obsolete and fail to adapt or generalize to new data. In contrast, deep learning methods do not require predefined detection rules and can learn the features of vulnerabilities during the training process. In this paper, we introduce a solution called Lightning Cat which is based on deep learning techniques. We train three deep learning models for detecting vulnerabilities in smart contract: Optimized-CodeBERT, Optimized-LSTM, and Optimized-CNN. Experimental results show that, in the Lightning Cat we propose, Optimized-CodeBERT model surpasses other methods, achieving an f1-score of 93.53%. To precisely extract vulnerability features, we acquire segments of vulnerable code functions to retain critical vulnerability features. Using the CodeBERT pre-training model for data preprocessing, we could capture the syntax and semantics of the code more accurately. To demonstrate the feasibility of our proposed solution, we evaluate its performance using the SolidiFI-benchmark dataset, which consists of 9369 vulnerable contracts injected with vulnerabilities from seven different types.
本文旨在探讨深度学习在智能合约漏洞检测中的应用。智能合约是区块链技术的重要组成部分,对开发去中心化应用程序至关重要。然而,智能合约漏洞可能导致财务损失和系统崩溃。静态分析工具经常用于检测智能合约中的漏洞,但由于它们高度依赖预定义规则且缺乏语义分析能力,往往会产生误报和漏报。此外,这些预定义规则很快就会过时,无法适应或推广到新数据。相比之下,深度学习方法不需要预定义的检测规则,并且可以在训练过程中学习漏洞的特征。在本文中,我们介绍了一种基于深度学习技术的名为Lightning Cat的解决方案。我们训练了三个用于检测智能合约漏洞的深度学习模型:优化的CodeBERT、优化的LSTM和优化的CNN。实验结果表明,在我们提出的Lightning Cat中,优化的CodeBERT模型优于其他方法,f1分数达到93.53%。为了精确提取漏洞特征,我们获取易受攻击的代码函数片段以保留关键的漏洞特征。使用CodeBERT预训练模型进行数据预处理,我们可以更准确地捕获代码的语法和语义。为了证明我们提出的解决方案的可行性,我们使用SolidiFI基准数据集评估其性能,该数据集由9369个注入了七种不同类型漏洞的易受攻击合约组成。