Suppr超能文献

从时间序列和静态基因表达数据推断基因网络:将基于随机森林的推断方法与特征选择方法相结合

Inference of Genetic Networks From Time-Series and Static Gene Expression Data: Combining a Random-Forest-Based Inference Method With Feature Selection Methods.

作者信息

Kimura Shuhei, Fukutomi Ryo, Tokuhisa Masato, Okada Mariko

机构信息

Faculty of Engineering, Tottori University, Tottori, Japan.

Graduate School of Sustainability Science, Tottori University, Tottori, Japan.

出版信息

Front Genet. 2020 Dec 15;11:595912. doi: 10.3389/fgene.2020.595912. eCollection 2020.

Abstract

Several researchers have focused on random-forest-based inference methods because of their excellent performance. Some of these inference methods also have a useful ability to analyze both time-series and static gene expression data. However, they are only of use in ranking all of the candidate regulations by assigning them confidence values. None have been capable of detecting the regulations that actually affect a gene of interest. In this study, we propose a method to remove unpromising candidate regulations by combining the random-forest-based inference method with a series of feature selection methods. In addition to detecting unpromising regulations, our proposed method uses outputs from the feature selection methods to adjust the confidence values of all of the candidate regulations that have been computed by the random-forest-based inference method. Numerical experiments showed that the combined application with the feature selection methods improved the performance of the random-forest-based inference method on 99 of the 100 trials performed on the artificial problems. However, the improvement tends to be small, since our combined method succeeded in removing only 19% of the candidate regulations at most. The combined application with the feature selection methods moreover makes the computational cost higher. While a bigger improvement at a lower computational cost would be ideal, we see no impediments to our investigation, given that our aim is to extract as much useful information as possible from a limited amount of gene expression data.

摘要

由于基于随机森林的推理方法性能出色,一些研究人员专注于此。其中一些推理方法还具备分析时间序列和静态基因表达数据的有用能力。然而,它们仅用于通过为所有候选调控赋予置信值来进行排序。尚无方法能够检测出实际影响目标基因的调控。在本研究中,我们提出一种方法,通过将基于随机森林的推理方法与一系列特征选择方法相结合,去除没有前景的候选调控。除了检测没有前景的调控外,我们提出的方法还利用特征选择方法的输出,调整基于随机森林的推理方法计算出的所有候选调控的置信值。数值实验表明,在针对人工问题进行的100次试验中,有99次将特征选择方法与之结合应用提高了基于随机森林的推理方法的性能。然而,这种改进往往较小,因为我们的组合方法最多只能成功去除19%的候选调控。此外,将特征选择方法与之结合应用会使计算成本更高。虽然以较低的计算成本实现更大的改进是理想的,但鉴于我们的目标是从有限的基因表达数据中提取尽可能多的有用信息,我们认为我们的研究没有障碍。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/444a/7770182/71048105187a/fgene-11-595912-g0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验