Department of Informatics and Computing, Singidunum University, Belgrade, Serbia.
Department of Computer Engineering and Technology, Guru Nanak Dev University, Amritsar, India.
J Comput Biol. 2022 Jun;29(6):515-529. doi: 10.1089/cmb.2021.0256. Epub 2022 Apr 19.
A large number of features lead to very high-dimensional data. The feature selection method reduces the dimension of data, increases the performance of prediction, and reduces the computation time. Feature selection is the process of selecting the optimal set of input features from a given data set in order to reduce the noise in data and keep the relevant features. The optimal feature subset contains all useful and relevant features and excludes any irrelevant feature that allows machine learning models to understand better and differentiate efficiently the patterns in data sets. In this article, we propose a binary hybrid metaheuristic-based algorithm for selecting the optimal feature subset. Concretely, the brain storm optimization algorithm is hybridized by the firefly algorithm and adopted as a wrapper method for feature selection problems on classification data sets. The proposed algorithm is evaluated on 21 data sets and compared with 11 metaheuristic algorithms. In addition, the proposed method is adopted for the coronavirus disease data set. The obtained experimental results substantiate the robustness of the proposed hybrid algorithm. It efficiently reduces and selects the feature subset and at the same time results in higher classification accuracy than other methods in the literature.
大量的特征导致了非常高维的数据。特征选择方法降低了数据的维度,提高了预测的性能,并减少了计算时间。特征选择是从给定的数据集中选择最优输入特征集的过程,以减少数据中的噪声并保留相关特征。最优特征子集包含所有有用和相关的特征,并排除任何不相关的特征,从而使机器学习模型能够更好地理解和有效地区分数据集的模式。在本文中,我们提出了一种基于二进制混合元启发式的算法,用于选择最优的特征子集。具体来说,将头脑风暴优化算法与萤火虫算法进行混合,并将其作为分类数据集上的特征选择问题的包装器方法。该算法在 21 个数据集上进行了评估,并与 11 种元启发式算法进行了比较。此外,该方法还被用于冠状病毒数据集。实验结果证实了所提出的混合算法的稳健性。它有效地减少和选择了特征子集,同时比文献中的其他方法获得了更高的分类精度。