Deng Wenping, Zhang Kui, Busov Victor, Wei Hairong
School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI, United States of America.
Department of Mathematical Sciences Michigan Technological University, Houghton, MI, United States of America.
PLoS One. 2017 Feb 3;12(2):e0171532. doi: 10.1371/journal.pone.0171532. eCollection 2017.
Present knowledge indicates a multilayered hierarchical gene regulatory network (ML-hGRN) often operates above a biological pathway. Although the ML-hGRN is very important for understanding how a pathway is regulated, there is almost no computational algorithm for directly constructing ML-hGRNs.
A backward elimination random forest (BWERF) algorithm was developed for constructing the ML-hGRN operating above a biological pathway. For each pathway gene, the BWERF used a random forest model to calculate the importance values of all transcription factors (TFs) to this pathway gene recursively with a portion (e.g. 1/10) of least important TFs being excluded in each round of modeling, during which, the importance values of all TFs to the pathway gene were updated and ranked until only one TF was remained in the list. The above procedure, termed BWERF. After that, the importance values of a TF to all pathway genes were aggregated and fitted to a Gaussian mixture model to determine the TF retention for the regulatory layer immediately above the pathway layer. The acquired TFs at the secondary layer were then set to be the new bottom layer to infer the next upper layer, and this process was repeated until a ML-hGRN with the expected layers was obtained.
BWERF improved the accuracy for constructing ML-hGRNs because it used backward elimination to exclude the noise genes, and aggregated the individual importance values for determining the TFs retention. We validated the BWERF by using it for constructing ML-hGRNs operating above mouse pluripotency maintenance pathway and Arabidopsis lignocellulosic pathway. Compared to GENIE3, BWERF showed an improvement in recognizing authentic TFs regulating a pathway. Compared to the bottom-up Gaussian graphical model algorithm we developed for constructing ML-hGRNs, the BWERF can construct ML-hGRNs with significantly reduced edges that enable biologists to choose the implicit edges for experimental validation.
现有知识表明,多层级基因调控网络(ML-hGRN)通常在生物途径之上运行。尽管ML-hGRN对于理解途径如何被调控非常重要,但几乎没有直接构建ML-hGRN的计算算法。
开发了一种反向消除随机森林(BWERF)算法,用于构建在生物途径之上运行的ML-hGRN。对于每个途径基因,BWERF使用随机森林模型递归计算所有转录因子(TF)对该途径基因的重要性值,在每一轮建模中排除一部分(例如1/10)最不重要的TF,在此期间,更新并排列所有TF对途径基因的重要性值,直到列表中只剩下一个TF。上述过程称为BWERF。之后,将一个TF对所有途径基因的重要性值进行汇总,并拟合到高斯混合模型中,以确定途径层之上紧邻调控层的TF保留情况。然后将在第二层获得的TF设置为新的底层,以推断下一个上层,重复此过程,直到获得具有预期层数的ML-hGRN。
BWERF提高了构建ML-hGRN的准确性,因为它使用反向消除来排除噪声基因,并汇总个体重要性值以确定TF保留情况。我们通过使用BWERF构建在小鼠多能性维持途径和拟南芥木质纤维素途径之上运行的ML-hGRN来验证它。与GENIE3相比,BWERF在识别调控途径的真实TF方面有所改进。与我们为构建ML-hGRN而开发的自底向上高斯图形模型算法相比,BWERF可以构建边显著减少的ML-hGRN,使生物学家能够选择隐含边进行实验验证。