Luo Weijun, Hankenson Kurt D, Woolf Peter J
Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI 48109, USA.
BMC Bioinformatics. 2008 Nov 3;9:467. doi: 10.1186/1471-2105-9-467.
Probability based statistical learning methods such as mutual information and Bayesian networks have emerged as a major category of tools for reverse engineering mechanistic relationships from quantitative biological data. In this work we introduce a new statistical learning strategy, MI3 that addresses three common issues in previous methods simultaneously: (1) handling of continuous variables, (2) detection of more complex three-way relationships and (3) better differentiation of causal versus confounding relationships. With these improvements, we provide a more realistic representation of the underlying biological system.
We test the MI3 algorithm using both synthetic and experimental data. In the synthetic data experiment, MI3 achieved an absolute sensitivity/precision of 0.77/0.83 and a relative sensitivity/precision both of 0.99. In addition, MI3 significantly outperformed the control methods, including Bayesian networks, classical two-way mutual information and a discrete version of MI3. We then used MI3 and control methods to infer a regulatory network centered at the MYC transcription factor from a published microarray dataset. Models selected by MI3 were numerically and biologically distinct from those selected by control methods. Unlike control methods, MI3 effectively differentiated true causal models from confounding models. MI3 recovered major MYC cofactors, and revealed major mechanisms involved in MYC dependent transcriptional regulation, which are strongly supported by literature. The MI3 network showed that limited sets of regulatory mechanisms are employed repeatedly to control the expression of large number of genes.
Overall, our work demonstrates that MI3 outperforms the frequently used control methods, and provides a powerful method for inferring mechanistic relationships underlying biological and other complex systems. The MI3 method is implemented in R in the "mi3" package, available under the GNU GPL from http://sysbio.engin.umich.edu/~luow/downloads.php and from the R package archive CRAN.
基于概率的统计学习方法,如互信息和贝叶斯网络,已成为从定量生物学数据逆向工程机制关系的主要工具类别。在这项工作中,我们引入了一种新的统计学习策略MI3,它同时解决了先前方法中的三个常见问题:(1)连续变量的处理;(2)更复杂的三向关系的检测;(3)因果关系与混杂关系的更好区分。通过这些改进,我们提供了对基础生物系统更现实的表示。
我们使用合成数据和实验数据测试了MI3算法。在合成数据实验中,MI3实现了0.77/0.83的绝对灵敏度/精度以及均为0.99的相对灵敏度/精度。此外,MI3显著优于对照方法,包括贝叶斯网络、经典双向互信息和MI3的离散版本。然后,我们使用MI3和对照方法从已发表的微阵列数据集中推断以MYC转录因子为中心的调控网络。由MI3选择的模型在数值和生物学上与由对照方法选择的模型不同。与对照方法不同,MI3有效地将真正的因果模型与混杂模型区分开来。MI3恢复了主要的MYC辅因子,并揭示了MYC依赖性转录调控中涉及的主要机制,这些机制得到了文献的有力支持。MI3网络表明,有限的调控机制集被反复用于控制大量基因的表达。
总体而言,我们的工作表明MI3优于常用的对照方法,并为推断生物和其他复杂系统背后的机制关系提供了一种强大的方法。MI3方法在R语言中以“mi3”包实现,可在GNU GPL许可下从http://sysbio.engin.umich.edu/~luow/downloads.php以及R包存档CRAN获取。