Qiu Shaoming, E Bicong, He Jingjie
School of Information Engineering, Dalian University, Dalian, China.
PLoS One. 2025 Apr 14;20(4):e0320808. doi: 10.1371/journal.pone.0320808. eCollection 2025.
Software defect prediction is a technology that uses known software information to predict defects in the target software. Generally, models are built using features such as software metrics, semantic information, and software networks. However, due to the complex software structure and the small number of samples, without effective feature representation and feature extraction methods, it is impossible to fully utilize software features, which can easily lead to misjudgments and reduced performance. In addition, a single feature cannot fully characterize the software structure. Therefore, this research proposes a new method to efficiently and accurately represent the Abstract Syntax Tree(AST) and a model called MFA(Multi Features Attention) that uses a deformable attention mechanism to extract features and uses a self-attention mechanism to fuse semantic and network features. By selecting 21 Java projects and comparing them with multiple models for cross-version and cross-project experiments, the experiments show that the average ACC, F1, AUC of the proposed model in the cross-version scheme reach 0.7, 0.614 and 0.711. In the cross-project scheme, the average ACC, F1 and AUC are 0.687, 0.575 and 0.696. Up to 41% better than other models, and the results of fusion features are better than those of a single feature, showing that MFA using two features for extraction and fusion has greater advantages in prediction performance.
软件缺陷预测是一种利用已知软件信息来预测目标软件中缺陷的技术。一般来说,模型是使用软件度量、语义信息和软件网络等特征构建的。然而,由于软件结构复杂且样本数量少,在没有有效的特征表示和特征提取方法的情况下,无法充分利用软件特征,这很容易导致误判和性能下降。此外,单一特征不能完全表征软件结构。因此,本研究提出了一种高效准确地表示抽象语法树(AST)的新方法,以及一种名为MFA(多特征注意力)的模型,该模型使用可变形注意力机制提取特征,并使用自注意力机制融合语义和网络特征。通过选择21个Java项目并与多个模型进行跨版本和跨项目实验比较,实验表明,所提模型在跨版本方案中的平均ACC、F1、AUC分别达到0.7、0.614和0.711。在跨项目方案中,平均ACC、F1和AUC分别为0.687、0.575和0.696。比其他模型高出41%,且融合特征的结果优于单一特征,表明使用两种特征进行提取和融合的MFA在预测性能上具有更大优势。