Shahbeig Saleh, Rahideh Akbar, Helfroush Mohammad Sadegh, Kazemi Kamran
Department of Electrical and Electronics Engineering, Shiraz University of Technology, Shiraz, Iran.
IET Syst Biol. 2018 Aug;12(4):162-169. doi: 10.1049/iet-syb.2017.0044.
Here, a two-phase search strategy is proposed to identify the biomarkers in gene expression data set for the prostate cancer diagnosis. A statistical filtering method is initially employed to remove the noisiest data. In the first phase of the search strategy, a multi-objective optimisation based on the binary particle swarm optimisation algorithm tuned by a chaotic method is proposed to select the optimal subset of genes with the minimum number of genes and the maximum classification accuracy. Finally, in the second phase of the search strategy, the cache-based modification of the sequential forward floating selection algorithm is used to find the most discriminant genes from the optimal subset of genes selected in the first phase. The results of applying the proposed algorithm on the available challenging prostate cancer data set demonstrate that the proposed algorithm can perfectly identify the informative genes such that the classification accuracy, sensitivity, and specificity of 100% are achieved with only nine biomarkers.
在此,提出了一种两阶段搜索策略,用于在基因表达数据集中识别用于前列腺癌诊断的生物标志物。最初采用一种统计过滤方法来去除噪声最大的数据。在搜索策略的第一阶段,提出了一种基于通过混沌方法调整的二进制粒子群优化算法的多目标优化方法,以选择具有最少基因数量和最大分类准确率的最优基因子集。最后,在搜索策略的第二阶段,基于缓存对顺序前向浮动选择算法进行改进,用于从第一阶段选择的最优基因子集中找到最具判别力的基因。将所提出的算法应用于现有的具有挑战性的前列腺癌数据集的结果表明,该算法能够完美地识别出信息基因,仅使用九个生物标志物就能实现100%的分类准确率、灵敏度和特异性。