Petralia Francesca, Wang Pei, Yang Jialiang, Tu Zhidong
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
Bioinformatics. 2015 Jun 15;31(12):i197-205. doi: 10.1093/bioinformatics/btv268.
Gene regulatory network (GRN) inference based on genomic data is one of the most actively pursued computational biological problems. Because different types of biological data usually provide complementary information regarding the underlying GRN, a model that integrates big data of diverse types is expected to increase both the power and accuracy of GRN inference. Towards this goal, we propose a novel algorithm named iRafNet: integrative random forest for gene regulatory network inference.
iRafNet is a flexible, unified integrative framework that allows information from heterogeneous data, such as protein-protein interactions, transcription factor (TF)-DNA-binding, gene knock-down, to be jointly considered for GRN inference. Using test data from the DREAM4 and DREAM5 challenges, we demonstrate that iRafNet outperforms the original random forest based network inference algorithm (GENIE3), and is highly comparable to the community learning approach. We apply iRafNet to construct GRN in Saccharomyces cerevisiae and demonstrate that it improves the performance in predicting TF-target gene regulations and provides additional functional insights to the predicted gene regulations.
The R code of iRafNet implementation and a tutorial are available at: http://research.mssm.edu/tulab/software/irafnet.html
基于基因组数据的基因调控网络(GRN)推断是计算生物学中最活跃的研究问题之一。由于不同类型的生物数据通常能提供关于潜在GRN的互补信息,因此一个整合多种类型大数据的模型有望提高GRN推断的能力和准确性。为实现这一目标,我们提出了一种名为iRafNet的新算法:用于基因调控网络推断的整合随机森林算法。
iRafNet是一个灵活、统一的整合框架,它允许在GRN推断中共同考虑来自异构数据的信息,如蛋白质-蛋白质相互作用、转录因子(TF)-DNA结合、基因敲除等。使用来自DREAM4和DREAM5挑战赛的测试数据,我们证明iRafNet优于基于原始随机森林的网络推断算法(GENIE3),并且与社区学习方法高度可比。我们应用iRafNet构建酿酒酵母中的GRN,并证明它在预测TF-靶基因调控方面提高了性能,并为预测的基因调控提供了额外的功能见解。
iRafNet实现的R代码和教程可在以下网址获取:http://research.mssm.edu/tulab/software/irafnet.html