• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

增强软件缺陷预测:一个具有改进特征选择和集成机器学习的框架。

Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning.

作者信息

Ali Misbah, Mazhar Tehseen, Al-Rasheed Amal, Shahzad Tariq, Yasin Ghadi Yazeed, Amir Khan Muhammad

机构信息

Department of Computer Science & Information Technology, Virtual University of Pakistan, Lahore, Pakistan.

Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

出版信息

PeerJ Comput Sci. 2024 Feb 28;10:e1860. doi: 10.7717/peerj-cs.1860. eCollection 2024.

DOI:10.7717/peerj-cs.1860
PMID:39669467
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11636684/
Abstract

Effective software defect prediction is a crucial aspect of software quality assurance, enabling the identification of defective modules before the testing phase. This study aims to propose a comprehensive five-stage framework for software defect prediction, addressing the current challenges in the field. The first stage involves selecting a cleaned version of NASA's defect datasets, including CM1, JM1, MC2, MW1, PC1, PC3, and PC4, ensuring the data's integrity. In the second stage, a feature selection technique based on the genetic algorithm is applied to identify the optimal subset of features. In the third stage, three heterogeneous binary classifiers, namely random forest, support vector machine, and naïve Bayes, are implemented as base classifiers. Through iterative tuning, the classifiers are optimized to achieve the highest level of accuracy individually. In the fourth stage, an ensemble machine-learning technique known as voting is applied as a master classifier, leveraging the collective decision-making power of the base classifiers. The final stage evaluates the performance of the proposed framework using five widely recognized performance evaluation measures: precision, recall, accuracy, F-measure, and area under the curve. Experimental results demonstrate that the proposed framework outperforms state-of-the-art ensemble and base classifiers employed in software defect prediction and achieves a maximum accuracy of 95.1%, showing its effectiveness in accurately identifying software defects. The framework also evaluates its efficiency by calculating execution times. Notably, it exhibits enhanced efficiency, significantly reducing the execution times during the training and testing phases by an average of 51.52% and 52.31%, respectively. This reduction contributes to a more computationally economical solution for accurate software defect prediction.

摘要

有效的软件缺陷预测是软件质量保证的关键环节,能够在测试阶段之前识别出有缺陷的模块。本研究旨在提出一个全面的五阶段软件缺陷预测框架,以应对该领域当前面临的挑战。第一阶段涉及选择美国国家航空航天局(NASA)缺陷数据集的清理版本,包括CM1、JM1、MC2、MW1、PC1、PC3和PC4,确保数据的完整性。在第二阶段,应用基于遗传算法的特征选择技术来识别最优特征子集。在第三阶段,将三种异构二分类器,即随机森林、支持向量机和朴素贝叶斯,作为基础分类器来实现。通过迭代调整,对这些分类器进行优化,以分别达到最高的准确率。在第四阶段,应用一种称为投票的集成机器学习技术作为主分类器,利用基础分类器的集体决策能力。最后阶段使用五种广泛认可的性能评估指标:精确率、召回率、准确率、F1值和曲线下面积,来评估所提出框架的性能。实验结果表明,所提出的框架优于软件缺陷预测中使用的现有集成和基础分类器,实现了95.1%的最高准确率,显示出其在准确识别软件缺陷方面的有效性。该框架还通过计算执行时间来评估其效率。值得注意的是,它表现出更高的效率,在训练和测试阶段分别显著减少了平均51.52%和52.31%的执行时间。这种减少为准确的软件缺陷预测提供了一种计算上更经济的解决方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/9979f43fb3c3/peerj-cs-10-1860-g019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/ea1f2fc3dc44/peerj-cs-10-1860-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/adcb69e87993/peerj-cs-10-1860-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/343e7bb5cb71/peerj-cs-10-1860-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/59535ec90285/peerj-cs-10-1860-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/874868f46a94/peerj-cs-10-1860-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/1d650d8177b0/peerj-cs-10-1860-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/cefbeba9ce16/peerj-cs-10-1860-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/e29630d7ac74/peerj-cs-10-1860-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/d284c22b12fc/peerj-cs-10-1860-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/2082bd50b8f0/peerj-cs-10-1860-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/ccd05bc021ac/peerj-cs-10-1860-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/eac225f67d50/peerj-cs-10-1860-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/fe4b7d87e85f/peerj-cs-10-1860-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/c9dc98714a21/peerj-cs-10-1860-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/322c3240ac0b/peerj-cs-10-1860-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/b2b3632b1efa/peerj-cs-10-1860-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/8dded32a05c8/peerj-cs-10-1860-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/a750d5f1fa9c/peerj-cs-10-1860-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/9979f43fb3c3/peerj-cs-10-1860-g019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/ea1f2fc3dc44/peerj-cs-10-1860-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/adcb69e87993/peerj-cs-10-1860-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/343e7bb5cb71/peerj-cs-10-1860-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/59535ec90285/peerj-cs-10-1860-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/874868f46a94/peerj-cs-10-1860-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/1d650d8177b0/peerj-cs-10-1860-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/cefbeba9ce16/peerj-cs-10-1860-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/e29630d7ac74/peerj-cs-10-1860-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/d284c22b12fc/peerj-cs-10-1860-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/2082bd50b8f0/peerj-cs-10-1860-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/ccd05bc021ac/peerj-cs-10-1860-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/eac225f67d50/peerj-cs-10-1860-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/fe4b7d87e85f/peerj-cs-10-1860-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/c9dc98714a21/peerj-cs-10-1860-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/322c3240ac0b/peerj-cs-10-1860-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/b2b3632b1efa/peerj-cs-10-1860-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/8dded32a05c8/peerj-cs-10-1860-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/a750d5f1fa9c/peerj-cs-10-1860-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f00f/11636684/9979f43fb3c3/peerj-cs-10-1860-g019.jpg

相似文献

1
Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning.增强软件缺陷预测:一个具有改进特征选择和集成机器学习的框架。
PeerJ Comput Sci. 2024 Feb 28;10:e1860. doi: 10.7717/peerj-cs.1860. eCollection 2024.
2
Ensemble of heterogeneous classifiers for diagnosis and prediction of coronary artery disease with reduced feature subset.用于冠状动脉疾病诊断和预测的具有简化特征子集的异构分类器集成
Comput Methods Programs Biomed. 2021 Jan;198:105770. doi: 10.1016/j.cmpb.2020.105770. Epub 2020 Sep 30.
3
Prediction of diabetes disease using an ensemble of machine learning multi-classifier models.使用机器学习多分类器集成模型预测糖尿病疾病。
BMC Bioinformatics. 2023 Sep 12;24(1):337. doi: 10.1186/s12859-023-05465-z.
4
An ensemble learning-based feature selection algorithm for identification of biomarkers of renal cell carcinoma.一种基于集成学习的用于识别肾细胞癌生物标志物的特征选择算法。
PeerJ Comput Sci. 2024 Jan 4;10:e1768. doi: 10.7717/peerj-cs.1768. eCollection 2024.
5
Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning.基于增强型探索性鲸鱼优化器的特征选择方案和随机森林集成学习的故障软件分类框架
Appl Intell (Dordr). 2023 Feb 9:1-43. doi: 10.1007/s10489-022-04427-x.
6
A Novel Rank Aggregation-Based Hybrid Multifilter Wrapper Feature Selection Method in Software Defect Prediction.一种新颖的基于排序聚合的混合多过滤器包装特征选择方法在软件缺陷预测中。
Comput Intell Neurosci. 2021 Nov 24;2021:5069016. doi: 10.1155/2021/5069016. eCollection 2021.
7
Development of an efficient novel method for coronary artery disease prediction using machine learning and deep learning techniques.利用机器学习和深度学习技术开发一种用于冠心病预测的高效新方法。
Technol Health Care. 2024;32(6):4545-4569. doi: 10.3233/THC-240740.
8
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
9
An Ensemble Feature Selection Approach-Based Machine Learning Classifiers for Prediction of COVID-19 Disease.一种基于集成特征选择方法的机器学习分类器用于预测新冠肺炎疾病
Int J Telemed Appl. 2024 Apr 17;2024:8188904. doi: 10.1155/2024/8188904. eCollection 2024.
10
An empirical analysis on webservice antipattern prediction in different variants of machine learning perspective.
Sci Rep. 2025 Feb 12;15(1):5183. doi: 10.1038/s41598-025-86454-5.

引用本文的文献

1
Feature selection using a multi-strategy improved parrot optimization algorithm in software defect prediction.基于多策略改进鹦鹉优化算法的软件缺陷预测中的特征选择
PeerJ Comput Sci. 2025 Apr 16;11:e2815. doi: 10.7717/peerj-cs.2815. eCollection 2025.
2
Addressing imbalanced data classification with Cluster-Based Reduced Noise SMOTE.基于聚类的降噪合成少数过采样技术解决不平衡数据分类问题
PLoS One. 2025 Feb 10;20(2):e0317396. doi: 10.1371/journal.pone.0317396. eCollection 2025.

本文引用的文献

1
Application of Bagging, Boosting and Stacking Ensemble and EasyEnsemble Methods for Landslide Susceptibility Mapping in the Three Gorges Reservoir Area of China.Bagging、Boosting 和 Stacking 集成方法和 EasyEnsemble 方法在三峡库区滑坡敏感性制图中的应用。
Int J Environ Res Public Health. 2023 Mar 11;20(6):4977. doi: 10.3390/ijerph20064977.
2
Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning.基于增强型探索性鲸鱼优化器的特征选择方案和随机森林集成学习的故障软件分类框架
Appl Intell (Dordr). 2023 Feb 9:1-43. doi: 10.1007/s10489-022-04427-x.
3
Principal component based support vector machine (PC-SVM): a hybrid technique for software defect detection.
基于主成分的支持向量机(PC-SVM):一种用于软件缺陷检测的混合技术。
Cluster Comput. 2021;24(3):2581-2595. doi: 10.1007/s10586-021-03282-8. Epub 2021 Apr 16.
4
A review on genetic algorithm: past, present, and future.关于遗传算法的综述:过去、现在与未来。
Multimed Tools Appl. 2021;80(5):8091-8126. doi: 10.1007/s11042-020-10139-6. Epub 2020 Oct 31.
5
MGRFE: Multilayer Recursive Feature Elimination Based on an Embedded Genetic Algorithm for Cancer Classification.MGRFE:基于嵌入式遗传算法的多层递归特征消除在癌症分类中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):621-632. doi: 10.1109/TCBB.2019.2921961. Epub 2021 Apr 6.