Kimura Shuhei, Fukutomi Ryo, Tokuhisa Masato, Okada Mariko
Faculty of Engineering, Tottori University, Tottori, Japan.
Graduate School of Sustainability Science, Tottori University, Tottori, Japan.
Front Genet. 2020 Dec 15;11:595912. doi: 10.3389/fgene.2020.595912. eCollection 2020.
Several researchers have focused on random-forest-based inference methods because of their excellent performance. Some of these inference methods also have a useful ability to analyze both time-series and static gene expression data. However, they are only of use in ranking all of the candidate regulations by assigning them confidence values. None have been capable of detecting the regulations that actually affect a gene of interest. In this study, we propose a method to remove unpromising candidate regulations by combining the random-forest-based inference method with a series of feature selection methods. In addition to detecting unpromising regulations, our proposed method uses outputs from the feature selection methods to adjust the confidence values of all of the candidate regulations that have been computed by the random-forest-based inference method. Numerical experiments showed that the combined application with the feature selection methods improved the performance of the random-forest-based inference method on 99 of the 100 trials performed on the artificial problems. However, the improvement tends to be small, since our combined method succeeded in removing only 19% of the candidate regulations at most. The combined application with the feature selection methods moreover makes the computational cost higher. While a bigger improvement at a lower computational cost would be ideal, we see no impediments to our investigation, given that our aim is to extract as much useful information as possible from a limited amount of gene expression data.
由于基于随机森林的推理方法性能出色,一些研究人员专注于此。其中一些推理方法还具备分析时间序列和静态基因表达数据的有用能力。然而,它们仅用于通过为所有候选调控赋予置信值来进行排序。尚无方法能够检测出实际影响目标基因的调控。在本研究中,我们提出一种方法,通过将基于随机森林的推理方法与一系列特征选择方法相结合,去除没有前景的候选调控。除了检测没有前景的调控外,我们提出的方法还利用特征选择方法的输出,调整基于随机森林的推理方法计算出的所有候选调控的置信值。数值实验表明,在针对人工问题进行的100次试验中,有99次将特征选择方法与之结合应用提高了基于随机森林的推理方法的性能。然而,这种改进往往较小,因为我们的组合方法最多只能成功去除19%的候选调控。此外,将特征选择方法与之结合应用会使计算成本更高。虽然以较低的计算成本实现更大的改进是理想的,但鉴于我们的目标是从有限的基因表达数据中提取尽可能多的有用信息,我们认为我们的研究没有障碍。