Bessonov Kyrylo, Van Steen Kristel
Medical Genomics, GIGA-R, Université de Liège, Sart-Tilman, Belgium.
Genet Epidemiol. 2016 Dec;40(8):767-778. doi: 10.1002/gepi.22017. Epub 2016 Nov 11.
Gene regulatory network (GRN) inference is an active area of research that facilitates understanding the complex interplays between biological molecules. We propose a novel framework to create such GRNs, based on Conditional Inference Forests (CIFs) as proposed by Strobl et al. Our framework consists of using ensembles of Conditional Inference Trees (CITs) and selecting an appropriate aggregation scheme for variant selection prior to network construction. We show on synthetic microarray data that taking the original implementation of CIFs with conditional permutation scheme (CIF ) may lead to improved performance compared to Breiman's implementation of Random Forests (RF). Among all newly introduced CIF-based methods and five network scenarios obtained from the DREAM4 challenge, CIF performed best. Networks derived from well-tuned CIFs, obtained by simply averaging P-values over tree ensembles (CIF ) are particularly attractive, because they combine adequate performance with computational efficiency. Moreover, thresholds for variable selection are based on significance levels for P-values and, hence, do not need to be tuned. From a practical point of view, our extensive simulations show the potential advantages of CIF -based methods. Although more work is needed to improve on speed, especially when fully exploiting the advantages of CITs in the context of heterogeneous and correlated data, we have shown that CIF methodology can be flexibly inserted in a framework to infer biological interactions. Notably, we confirmed biologically relevant interaction between IL2RA and FOXP1, linked to the IL-2 signaling pathway and to type 1 diabetes.
基因调控网络(GRN)推断是一个活跃的研究领域,有助于理解生物分子之间复杂的相互作用。我们提出了一个基于施特罗布等人提出的条件推断森林(CIF)创建此类GRN的新颖框架。我们的框架包括使用条件推断树(CIT)的集成,并在网络构建之前选择合适的聚合方案进行变量选择。我们在合成微阵列数据上表明,与布莱曼的随机森林(RF)实现相比,采用具有条件置换方案的CIF原始实现(CIF )可能会提高性能。在所有新引入的基于CIF的方法和从DREAM4挑战中获得的五个网络场景中,CIF表现最佳。通过简单地对树集成的P值求平均值获得的经过良好调优的CIF派生网络(CIF )特别有吸引力,因为它们将足够的性能与计算效率结合在一起。此外,变量选择的阈值基于P值的显著性水平,因此无需进行调优。从实际角度来看,我们广泛的模拟显示了基于CIF的方法的潜在优势。尽管需要做更多工作来提高速度,特别是在充分利用CIT在异构和相关数据背景下的优势时,但我们已经表明CIF方法可以灵活地插入到一个框架中以推断生物相互作用。值得注意的是,我们证实了IL2RA和FOXP1之间与IL-2信号通路和1型糖尿病相关的生物学相关相互作用。