Suppr超能文献

一种新颖的基于排序聚合的混合多过滤器包装特征选择方法在软件缺陷预测中。

A Novel Rank Aggregation-Based Hybrid Multifilter Wrapper Feature Selection Method in Software Defect Prediction.

机构信息

Department of Computer and Information Science, Universiti Teknologi PETRONAS, Bandar Seri Iskandar 32610, Perak, Malaysia.

Department of Computer Science, University of Ilorin, Ilorin 1515, Nigeria.

出版信息

Comput Intell Neurosci. 2021 Nov 24;2021:5069016. doi: 10.1155/2021/5069016. eCollection 2021.

Abstract

The high dimensionality of software metric features has long been noted as a data quality problem that affects the performance of software defect prediction (SDP) models. This drawback makes it necessary to apply feature selection (FS) algorithm(s) in SDP processes. FS approaches can be categorized into three types, namely, filter FS (FFS), wrapper FS (WFS), and hybrid FS (HFS). HFS has been established as superior because it combines the strength of both FFS and WFS methods. However, selecting the most appropriate FFS (filter rank selection problem) for HFS is a challenge because the performance of FFS methods depends on the choice of datasets and classifiers. In addition, the local optima stagnation and high computational costs of WFS due to large search spaces are inherited by the HFS method. Therefore, as a solution, this study proposes a novel rank aggregation-based hybrid multifilter wrapper feature selection (RAHMFWFS) method for the selection of relevant and irredundant features from software defect datasets. The proposed RAHMFWFS is divided into two stepwise stages. The first stage involves a rank aggregation-based multifilter feature selection (RMFFS) method that addresses the filter rank selection problem by aggregating individual rank lists from multiple filter methods, using a novel rank aggregation method to generate a single, robust, and non-disjoint rank list. In the second stage, the aggregated ranked features are further preprocessed by an enhanced wrapper feature selection (EWFS) method based on a dynamic reranking strategy that is used to guide the feature subset selection process of the HFS method. This, in turn, reduces the number of evaluation cycles while amplifying or maintaining its prediction performance. The feasibility of the proposed RAHMFWFS was demonstrated on benchmarked software defect datasets with Naïve Bayes and Decision Tree classifiers, based on accuracy, the area under the curve (AUC), and F-measure values. The experimental results showed the effectiveness of RAHMFWFS in addressing filter rank selection and local optima stagnation problems in HFS, as well as the ability to select optimal features from SDP datasets while maintaining or enhancing the performance of SDP models. To conclude, the proposed RAHMFWFS achieved good performance by improving the prediction performances of SDP models across the selected datasets, compared to existing state-of-the-arts HFS methods.

摘要

软件度量特征的高维度一直被认为是影响软件缺陷预测 (SDP) 模型性能的数据质量问题。这一缺点使得在 SDP 过程中应用特征选择 (FS) 算法变得必要。FS 方法可分为三种类型,即过滤 FS (FFS)、包装器 FS (WFS) 和混合 FS (HFS)。HFS 因其结合了 FFS 和 WFS 方法的优势而被认为是优越的。然而,选择最适合 HFS 的最适宜 FFS(过滤等级选择问题)是一个挑战,因为 FFS 方法的性能取决于数据集和分类器的选择。此外,由于搜索空间大,WFS 的局部最优停滞和高计算成本被 HFS 方法继承。因此,作为一种解决方案,本研究提出了一种新颖的基于排序聚合的混合多过滤包装特征选择 (RAHMFWFS) 方法,用于从软件缺陷数据集中选择相关和非冗余特征。所提出的 RAHMFWFS 分为两个逐步阶段。第一阶段涉及基于排序聚合的多过滤特征选择 (RMFFS) 方法,该方法通过使用一种新的排序聚合方法从多个过滤方法的单个排序列表中聚合,生成单个、稳健且非不相交的排序列表,从而解决过滤等级选择问题。在第二阶段,聚合的排序特征进一步通过基于动态重新排序策略的增强包装特征选择 (EWFS) 方法进行预处理,该策略用于指导 HFS 方法的特征子集选择过程。这反过来减少了评估周期的数量,同时放大或保持其预测性能。在基于准确率、曲线下面积 (AUC) 和 F 度量值的基准软件缺陷数据集上,使用 Naive Bayes 和决策树分类器对所提出的 RAHMFWFS 的可行性进行了验证。实验结果表明,RAHMFWFS 能够解决 HFS 中的过滤等级选择和局部最优停滞问题,并能够从 SDP 数据集中选择最佳特征,同时保持或增强 SDP 模型的性能。总之,与现有的先进的 HFS 方法相比,所提出的 RAHMFWFS 通过提高所选数据集的 SDP 模型的预测性能,取得了良好的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e406/8635927/f646c2427aa1/CIN2021-5069016.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验