Zhang Yulin, Hu Yong, Chen Xiao
School of Cyber Science and Engineering, Sichuan University, Chengdu 610207, China.
Sensors (Basel). 2024 Feb 20;24(5):1351. doi: 10.3390/s24051351.
With the increasing use of open-source libraries and secondary development, software projects face security vulnerabilities. Existing studies on source code vulnerability detection rely on natural language processing techniques, but they overlook the intricate dependencies in programming languages. To address this, we propose a framework called Context and Multi-Features-based Vulnerability Detection (CMFVD). CMFVD integrates source code graphs and textual sequences, using a novel slicing method called Context Slicing to capture contextual information. The framework combines graph convolutional networks (GCNs) and bidirectional gated recurrent units (BGRUs) with attention mechanisms to extract local semantic and syntactic information. Experimental results on Software Assurance Reference Datasets (SARDs) demonstrate CMFVD's effectiveness, achieving the highest F1-score of 0.986 and outperforming other models. CMFVD offers a promising approach to identifying and rectifying security flaws in large-scale codebases.
随着开源库的使用增加和二次开发,软件项目面临安全漏洞。现有的关于源代码漏洞检测的研究依赖于自然语言处理技术,但它们忽略了编程语言中复杂的依赖关系。为了解决这个问题,我们提出了一个名为基于上下文和多特征的漏洞检测(CMFVD)的框架。CMFVD集成了源代码图和文本序列,使用一种名为上下文切片的新颖切片方法来捕获上下文信息。该框架将图卷积网络(GCN)和双向门控循环单元(BGRU)与注意力机制相结合,以提取局部语义和句法信息。在软件保障参考数据集(SARDs)上的实验结果证明了CMFVD的有效性,实现了高达0.986的最高F1分数,并优于其他模型。CMFVD为识别和纠正大规模代码库中的安全缺陷提供了一种有前景的方法。