School of Computer Science and Communication Engineering, Jiangsu University, Xuefu Road, Zhenjiang, Jiangsu, China.
Jiangsu Key Laboratory of Security Technology for Industrial Cyberspace, Zhenjiang, Jiangsu, China.
BMC Bioinformatics. 2019 Jun 10;20(Suppl 8):289. doi: 10.1186/s12859-019-2773-x.
Gene selection is one of the critical steps in the course of the classification of microarray data. Since particle swarm optimization has no complicated evolutionary operators and fewer parameters need to be adjusted, it has been used increasingly as an effective technique for gene selection. Since particle swarm optimization is apt to converge to local minima which lead to premature convergence, some particle swarm optimization based gene selection methods may select non-optimal genes with high probability. To select predictive genes with low redundancy as well as not filtering out key genes is still a challenge.
To obtain predictive genes with lower redundancy as well as overcome the deficiencies of traditional particle swarm optimization based gene selection methods, a hybrid gene selection method based on gene scoring strategy and improved particle swarm optimization is proposed in this paper. To select the genes highly related to out samples' classes, a gene scoring strategy based on randomization and extreme learning machine is proposed to filter much irrelevant genes. With the third-level gene pool established by multiple filter strategy, an improved particle swarm optimization is proposed to perform gene selection. In the improved particle swarm optimization, to decrease the likelihood of the premature of the swarm the Metropolis criterion of simulated annealing algorithm is introduced to update the particles, and the half of the swarm are reinitialized when the swarm is trapped into local minima.
Combining the gene scoring strategy with the improved particle swarm optimization, the new method could select functional gene subsets which are significantly sensitive to the samples' classes. With the few discriminative genes selected by the proposed method, extreme learning machine and support vector machine classifiers achieve much high prediction accuracy on several public microarray data, which in turn verifies the efficiency and effectiveness of the proposed gene selection method.
基因选择是微阵列数据分析分类过程中的关键步骤之一。由于粒子群优化没有复杂的进化算子,并且需要调整的参数较少,因此它已越来越多地被用作基因选择的有效技术。由于粒子群优化容易收敛到导致过早收敛的局部最小值,因此一些基于粒子群优化的基因选择方法可能会以高概率选择非最优基因。选择具有低冗余度的预测基因而不滤除关键基因仍然是一个挑战。
为了获得具有较低冗余度的预测基因,并克服传统基于粒子群优化的基因选择方法的缺陷,本文提出了一种基于基因评分策略和改进粒子群优化的混合基因选择方法。为了选择与外样本类高度相关的基因,提出了一种基于随机化和极限学习机的基因评分策略,以过滤掉大量不相关的基因。通过多级过滤策略建立的三级基因库,提出了改进的粒子群优化来进行基因选择。在改进的粒子群优化中,为了降低群体过早出现的可能性,引入了模拟退火算法的Metropolis 准则来更新粒子,并且当群体陷入局部最小值时,将重新初始化一半的群体。
将基因评分策略与改进的粒子群优化相结合,新方法可以选择对样本类具有显著敏感性的功能基因子集。通过所提出的方法选择的少数判别基因,极限学习机和支持向量机分类器在几个公共微阵列数据集上实现了更高的预测精度,这反过来又验证了所提出的基因选择方法的效率和有效性。