基于二进制状态转换算法和 ReliefF 的混合特征选择方法。
A Hybrid Feature Selection Method Based on Binary State Transition Algorithm and ReliefF.
出版信息
IEEE J Biomed Health Inform. 2019 Sep;23(5):1888-1898. doi: 10.1109/JBHI.2018.2872811. Epub 2018 Sep 28.
Feature selection problems often appear in the application of data mining, which have been difficult to handle due to the NP-hard property of these problems. In this study, a simple but efficient hybrid feature selection method is proposed based on binary state transition algorithm and ReliefF, called ReliefF-BSTA. This method contains two phases: the filter phase and the wrapper phase. There are three aspects of advantages in this method. First, an initialization approach based on feature ranking is designed to make sure that the initial solution is not easy to get tapped into local optimum. Then, a probability substitute operator based on feature weights is developed to update the current solution according to the different mutation probabilities of the features. Finally, a new selection strategy based on relative dominance is presented to find the current best solution. The simple and efficient algorithm k-nearest neighborhood with the leave-one-out cross validation is used as a classifier to evaluate feature subset candidates. The experimental results indicate that the proposed method is more efficient in terms of the classification accuracy through a comparison to other feature selection methods using seven public datasets and several real biomedical datasets. For public datasets, the proposed method improved the classification average accuracy by about 2.5% compared with the filter method. For a specific biomedical dataset AID1284, the classification accuracy significantly increased from 77.24% to 85.25% by using the proposed method.
特征选择问题在数据挖掘的应用中经常出现,由于这些问题的 NP 难性质,它们一直难以处理。在这项研究中,提出了一种基于二进制状态转换算法和 ReliefF 的简单而有效的混合特征选择方法,称为 ReliefF-BSTA。该方法包含两个阶段:过滤阶段和包装阶段。该方法有三个方面的优势。首先,设计了基于特征排序的初始化方法,以确保初始解决方案不易陷入局部最优。然后,开发了一种基于特征权重的概率替代算子,根据特征的不同突变概率更新当前解决方案。最后,提出了一种基于相对优势的新选择策略,以找到当前的最佳解决方案。使用简单高效的算法 k-最近邻和留一交叉验证作为分类器来评估特征子集候选。实验结果表明,与使用七个公共数据集和几个真实生物医学数据集的其他特征选择方法相比,该方法在分类准确性方面更有效。对于公共数据集,与过滤方法相比,该方法将分类平均准确率提高了约 2.5%。对于特定的生物医学数据集 AID1284,使用所提出的方法,分类准确率从 77.24%显著提高到 85.25%。