Computational Biology Program, New York University Sackler School of Medicine, New York, New York, United States of America.
PLoS One. 2010 Oct 25;5(10):e13397. doi: 10.1371/journal.pone.0013397.
Current technologies have lead to the availability of multiple genomic data types in sufficient quantity and quality to serve as a basis for automatic global network inference. Accordingly, there are currently a large variety of network inference methods that learn regulatory networks to varying degrees of detail. These methods have different strengths and weaknesses and thus can be complementary. However, combining different methods in a mutually reinforcing manner remains a challenge.
We investigate how three scalable methods can be combined into a useful network inference pipeline. The first is a novel t-test-based method that relies on a comprehensive steady-state knock-out dataset to rank regulatory interactions. The remaining two are previously published mutual information and ordinary differential equation based methods (tlCLR and Inferelator 1.0, respectively) that use both time-series and steady-state data to rank regulatory interactions; the latter has the added advantage of also inferring dynamic models of gene regulation which can be used to predict the system's response to new perturbations.
CONCLUSION/SIGNIFICANCE: Our t-test based method proved powerful at ranking regulatory interactions, tying for first out of methods in the DREAM4 100-gene in-silico network inference challenge. We demonstrate complementarity between this method and the two methods that take advantage of time-series data by combining the three into a pipeline whose ability to rank regulatory interactions is markedly improved compared to either method alone. Moreover, the pipeline is able to accurately predict the response of the system to new conditions (in this case new double knock-out genetic perturbations). Our evaluation of the performance of multiple methods for network inference suggests avenues for future methods development and provides simple considerations for genomic experimental design. Our code is publicly available at http://err.bio.nyu.edu/inferelator/.
目前的技术已经可以提供多种足够数量和质量的基因组数据类型,作为自动全局网络推断的基础。因此,目前有大量的网络推断方法可以学习不同程度详细程度的调控网络。这些方法各有优缺点,因此可以互补。然而,以相互增强的方式组合不同的方法仍然是一个挑战。
我们研究了如何将三种可扩展的方法组合成一个有用的网络推断管道。第一种是一种新颖的基于 t 检验的方法,它依赖于一个全面的稳态敲除数据集来对调控相互作用进行排序。其余两种是以前发表的基于互信息和常微分方程的方法(分别为 tlCLR 和 Inferelator 1.0),它们使用时间序列和稳态数据对调控相互作用进行排序;后者的优点是还可以推断基因调控的动态模型,可用于预测系统对新扰动的响应。
结论/意义:我们基于 t 检验的方法在对调控相互作用进行排序方面表现出色,在 DREAM4 100 个基因的计算机网络推断挑战中与其他方法并列第一。我们证明了这种方法与利用时间序列数据的两种方法之间具有互补性,通过将这三种方法组合成一个管道,可以显著提高对调控相互作用的排序能力,与单独使用任何一种方法相比都有显著提高。此外,该管道能够准确预测系统对新条件(在这种情况下是新的双敲除遗传扰动)的响应。我们对多种网络推断方法性能的评估为未来的方法开发提供了途径,并为基因组实验设计提供了简单的考虑因素。我们的代码可在 http://err.bio.nyu.edu/inferelator/ 上公开获取。