Villaverde Alejandro F, Ross John, Morán Federico, Banga Julio R
Bioprocess Engineering Group, IIM-CSIC, Vigo, Spain.
Department of Chemistry, Stanford University, Stanford, California, United States of America.
PLoS One. 2014 May 7;9(5):e96732. doi: 10.1371/journal.pone.0096732. eCollection 2014.
The prediction of links among variables from a given dataset is a task referred to as network inference or reverse engineering. It is an open problem in bioinformatics and systems biology, as well as in other areas of science. Information theory, which uses concepts such as mutual information, provides a rigorous framework for addressing it. While a number of information-theoretic methods are already available, most of them focus on a particular type of problem, introducing assumptions that limit their generality. Furthermore, many of these methods lack a publicly available implementation. Here we present MIDER, a method for inferring network structures with information theoretic concepts. It consists of two steps: first, it provides a representation of the network in which the distance among nodes indicates their statistical closeness. Second, it refines the prediction of the existing links to distinguish between direct and indirect interactions and to assign directionality. The method accepts as input time-series data related to some quantitative features of the network nodes (such as e.g. concentrations, if the nodes are chemical species). It takes into account time delays between variables, and allows choosing among several definitions and normalizations of mutual information. It is general purpose: it may be applied to any type of network, cellular or otherwise. A Matlab implementation including source code and data is freely available (http://www.iim.csic.es/~gingproc/mider.html). The performance of MIDER has been evaluated on seven different benchmark problems that cover the main types of cellular networks, including metabolic, gene regulatory, and signaling. Comparisons with state of the art information-theoretic methods have demonstrated the competitive performance of MIDER, as well as its versatility. Its use does not demand any a priori knowledge from the user; the default settings and the adaptive nature of the method provide good results for a wide range of problems without requiring tuning.
从给定数据集中预测变量之间的联系是一项被称为网络推理或逆向工程的任务。这在生物信息学和系统生物学以及其他科学领域都是一个开放问题。信息论利用互信息等概念,为解决该问题提供了一个严格的框架。虽然已经有许多信息论方法,但其中大多数都专注于特定类型的问题,引入了限制其通用性的假设。此外,这些方法中的许多都缺乏公开可用的实现。在这里,我们提出了MIDER,一种利用信息论概念推断网络结构的方法。它由两个步骤组成:首先,它提供了一种网络表示,其中节点之间的距离表示它们的统计紧密程度。其次,它改进了对现有链接的预测,以区分直接和间接相互作用并确定方向性。该方法接受与网络节点的某些定量特征(例如,如果节点是化学物质,则为浓度)相关的时间序列数据作为输入。它考虑了变量之间的时间延迟,并允许在互信息的几种定义和归一化方法中进行选择。它是通用的:可应用于任何类型的网络,无论是细胞网络还是其他网络。一个包括源代码和数据的Matlab实现可免费获取(http://www.iim.csic.es/~gingproc/mider.html)。MIDER的性能已在涵盖细胞网络主要类型(包括代谢、基因调控和信号传导)的七个不同基准问题上进行了评估。与现有信息论方法的比较证明了MIDER的竞争力及其通用性。它的使用不需要用户具备任何先验知识;该方法的默认设置和自适应性质可为广泛的问题提供良好的结果,而无需进行调整。