School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, KwaZulu-Natal, South Africa.
Department of Computer Science, Federal University of Lafia, Lafia, Nasarawa State, Nigeria.
PLoS One. 2022 Oct 6;17(10):e0274850. doi: 10.1371/journal.pone.0274850. eCollection 2022.
Selecting appropriate feature subsets is a vital task in machine learning. Its main goal is to remove noisy, irrelevant, and redundant feature subsets that could negatively impact the learning model's accuracy and improve classification performance without information loss. Therefore, more advanced optimization methods have been employed to locate the optimal subset of features. This paper presents a binary version of the dwarf mongoose optimization called the BDMO algorithm to solve the high-dimensional feature selection problem. The effectiveness of this approach was validated using 18 high-dimensional datasets from the Arizona State University feature selection repository and compared the efficacy of the BDMO with other well-known feature selection techniques in the literature. The results show that the BDMO outperforms other methods producing the least average fitness value in 14 out of 18 datasets which means that it achieved 77.77% on the overall best fitness values. The result also shows BDMO demonstrating stability by returning the least standard deviation (SD) value in 13 of 18 datasets (72.22%). Furthermore, the study achieved higher validation accuracy in 15 of the 18 datasets (83.33%) over other methods. The proposed approach also yielded the highest validation accuracy attainable in the COIL20 and Leukemia datasets which vividly portray the superiority of the BDMO.
选择合适的特征子集是机器学习中的一项重要任务。其主要目的是去除可能对学习模型的准确性产生负面影响的噪声、不相关和冗余的特征子集,同时在不损失信息的情况下提高分类性能。因此,更多先进的优化方法被用于寻找最优的特征子集。本文提出了一种称为 BDMO 的二进制矮袋鼠优化算法,用于解决高维特征选择问题。该方法的有效性通过使用来自亚利桑那州立大学特征选择库的 18 个高维数据集进行验证,并将 BDMO 的效果与文献中其他著名的特征选择技术进行比较。结果表明,BDMO 在 18 个数据集的 14 个数据集上产生了最小的平均适应度值,这意味着它在整体最佳适应度值上达到了 77.77%。结果还表明,BDMO 在 13 个数据集(72.22%)中返回了最小的标准偏差(SD)值,表现出了稳定性。此外,该方法在 18 个数据集的 15 个数据集(83.33%)中实现了更高的验证准确性,优于其他方法。该方法在 COIL20 和白血病数据集上也取得了最高的验证准确性,生动地展示了 BDMO 的优越性。