Suppr超能文献

用于特征选择的二元模拟正态分布优化器:理论及在新冠肺炎数据集上的应用

Binary Simulated Normal Distribution Optimizer for feature selection: Theory and application in COVID-19 datasets.

作者信息

Ahmed Shameem, Sheikh Khalid Hassan, Mirjalili Seyedali, Sarkar Ram

机构信息

Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India.

King Abdulaziz University, Jeddah, Saudi Arabia.

出版信息

Expert Syst Appl. 2022 Aug 15;200:116834. doi: 10.1016/j.eswa.2022.116834. Epub 2022 Mar 15.

Abstract

Classification accuracy achieved by a machine learning technique depends on the feature set used in the learning process. However, it is often found that all the features extracted by some means for a particular task do not contribute to the classification process. Feature selection (FS) is an imperative and challenging pre-processing technique that helps to discard the unnecessary and irrelevant features while reducing the computational time and space requirement and increasing the classification accuracy. Generalized Normal Distribution Optimizer (GNDO), a recently proposed meta-heuristic algorithm, can be used to solve any optimization problem. In this paper, a hybrid version of GNDO with Simulated Annealing (SA) called Binary Simulated Normal Distribution Optimizer (BSNDO) is proposed which uses SA as a local search to achieve higher classification accuracy. The proposed method is evaluated on 18 well-known UCI datasets and compared with its predecessor as well as some popular FS methods. Moreover, this method is tested on high dimensional microarray datasets to prove its worth in real-life datasets. On top of that, it is also applied to a COVID-19 dataset for classification purposes. The obtained results prove the usefulness of BSNDO as a FS method. The source code of this work is publicly available at https://github.com/ahmed-shameem/Feature_selection.

摘要

机器学习技术所实现的分类准确率取决于学习过程中使用的特征集。然而,人们常常发现,通过某种方式为特定任务提取的所有特征并非都对分类过程有贡献。特征选择(FS)是一种必不可少且具有挑战性的预处理技术,它有助于舍弃不必要和不相关的特征,同时减少计算时间和空间需求,并提高分类准确率。广义正态分布优化器(GNDO)是最近提出的一种元启发式算法,可用于解决任何优化问题。本文提出了一种将GNDO与模拟退火(SA)相结合的混合版本,称为二元模拟正态分布优化器(BSNDO),它使用SA作为局部搜索来实现更高的分类准确率。该方法在18个著名的UCI数据集上进行了评估,并与其前身以及一些流行的FS方法进行了比较。此外,该方法还在高维微阵列数据集上进行了测试,以证明其在实际数据集上的价值。最重要的是,它还被应用于一个用于分类目的的COVID-19数据集。所获得的结果证明了BSNDO作为一种FS方法的有效性。这项工作的源代码可在https://github.com/ahmed-shameem/Feature_selection上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0be/9396289/1532c548d1ab/fx1001_lrg.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验