Ahmed Shameem, Sheikh Khalid Hassan, Mirjalili Seyedali, Sarkar Ram
Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India.
King Abdulaziz University, Jeddah, Saudi Arabia.
Expert Syst Appl. 2022 Aug 15;200:116834. doi: 10.1016/j.eswa.2022.116834. Epub 2022 Mar 15.
Classification accuracy achieved by a machine learning technique depends on the feature set used in the learning process. However, it is often found that all the features extracted by some means for a particular task do not contribute to the classification process. Feature selection (FS) is an imperative and challenging pre-processing technique that helps to discard the unnecessary and irrelevant features while reducing the computational time and space requirement and increasing the classification accuracy. Generalized Normal Distribution Optimizer (GNDO), a recently proposed meta-heuristic algorithm, can be used to solve any optimization problem. In this paper, a hybrid version of GNDO with Simulated Annealing (SA) called Binary Simulated Normal Distribution Optimizer (BSNDO) is proposed which uses SA as a local search to achieve higher classification accuracy. The proposed method is evaluated on 18 well-known UCI datasets and compared with its predecessor as well as some popular FS methods. Moreover, this method is tested on high dimensional microarray datasets to prove its worth in real-life datasets. On top of that, it is also applied to a COVID-19 dataset for classification purposes. The obtained results prove the usefulness of BSNDO as a FS method. The source code of this work is publicly available at https://github.com/ahmed-shameem/Feature_selection.
机器学习技术所实现的分类准确率取决于学习过程中使用的特征集。然而,人们常常发现,通过某种方式为特定任务提取的所有特征并非都对分类过程有贡献。特征选择(FS)是一种必不可少且具有挑战性的预处理技术,它有助于舍弃不必要和不相关的特征,同时减少计算时间和空间需求,并提高分类准确率。广义正态分布优化器(GNDO)是最近提出的一种元启发式算法,可用于解决任何优化问题。本文提出了一种将GNDO与模拟退火(SA)相结合的混合版本,称为二元模拟正态分布优化器(BSNDO),它使用SA作为局部搜索来实现更高的分类准确率。该方法在18个著名的UCI数据集上进行了评估,并与其前身以及一些流行的FS方法进行了比较。此外,该方法还在高维微阵列数据集上进行了测试,以证明其在实际数据集上的价值。最重要的是,它还被应用于一个用于分类目的的COVID-19数据集。所获得的结果证明了BSNDO作为一种FS方法的有效性。这项工作的源代码可在https://github.com/ahmed-shameem/Feature_selection上公开获取。