Zhang Xiujun, Zhao Juan, Hao Jin-Kao, Zhao Xing-Ming, Chen Luonan
Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China Department of Mathematics, Xinyang Normal University, Xinyang 464000, China School of Chemical and Biomedical Engineering, Nanyang Technological University, Singapore 637459, Singapore.
Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
Nucleic Acids Res. 2015 Mar 11;43(5):e31. doi: 10.1093/nar/gku1315. Epub 2014 Dec 24.
Mutual information (MI), a quantity describing the nonlinear dependence between two random variables, has been widely used to construct gene regulatory networks (GRNs). Despite its good performance, MI cannot separate the direct regulations from indirect ones among genes. Although the conditional mutual information (CMI) is able to identify the direct regulations, it generally underestimates the regulation strength, i.e. it may result in false negatives when inferring gene regulations. In this work, to overcome the problems, we propose a novel concept, namely conditional mutual inclusive information (CMI2), to describe the regulations between genes. Furthermore, with CMI2, we develop a new approach, namely CMI2NI (CMI2-based network inference), for reverse-engineering GRNs. In CMI2NI, CMI2 is used to quantify the mutual information between two genes given a third one through calculating the Kullback-Leibler divergence between the postulated distributions of including and excluding the edge between the two genes. The benchmark results on the GRNs from DREAM challenge as well as the SOS DNA repair network in Escherichia coli demonstrate the superior performance of CMI2NI. Specifically, even for gene expression data with small sample size, CMI2NI can not only infer the correct topology of the regulation networks but also accurately quantify the regulation strength between genes. As a case study, CMI2NI was also used to reconstruct cancer-specific GRNs using gene expression data from The Cancer Genome Atlas (TCGA). CMI2NI is freely accessible at http://www.comp-sysbio.org/cmi2ni.
互信息(MI)是描述两个随机变量之间非线性依赖关系的一个量,已被广泛用于构建基因调控网络(GRN)。尽管MI性能良好,但它无法区分基因之间的直接调控和间接调控。虽然条件互信息(CMI)能够识别直接调控,但它通常会低估调控强度,即在推断基因调控时可能会导致假阴性。在这项工作中,为了克服这些问题,我们提出了一个新的概念,即条件互包含信息(CMI2),来描述基因之间的调控。此外,利用CMI2,我们开发了一种新的方法,即CMI2NI(基于CMI2的网络推断),用于反向工程GRN。在CMI2NI中,通过计算包含和排除两个基因之间边的假设分布之间的库尔贝克-莱布勒散度,CMI2用于量化给定第三个基因时两个基因之间的互信息。来自DREAM挑战赛的GRN以及大肠杆菌中的SOS DNA修复网络的基准结果证明了CMI2NI的优越性能。具体来说,即使对于样本量较小的基因表达数据,CMI2NI不仅可以推断调控网络的正确拓扑结构,还可以准确量化基因之间的调控强度。作为一个案例研究,CMI2NI还被用于使用来自癌症基因组图谱(TCGA)的基因表达数据重建癌症特异性GRN。可通过http://www.comp-sysbio.org/cmi2ni免费访问CMI2NI。