Zhu Jun, Wiener Matthew C, Zhang Chunsheng, Fridman Arthur, Minch Eric, Lum Pek Y, Sachs Jeffrey R, Schadt Eric E
Rosetta Inpharmatics, Seattle, Washington, United States of America.
PLoS Comput Biol. 2007 Apr 13;3(4):e69. doi: 10.1371/journal.pcbi.0030069. Epub 2007 Feb 27.
To dissect common human diseases such as obesity and diabetes, a systematic approach is needed to study how genes interact with one another, and with genetic and environmental factors, to determine clinical end points or disease phenotypes. Bayesian networks provide a convenient framework for extracting relationships from noisy data and are frequently applied to large-scale data to derive causal relationships among variables of interest. Given the complexity of molecular networks underlying common human disease traits, and the fact that biological networks can change depending on environmental conditions and genetic factors, large datasets, generally involving multiple perturbations (experiments), are required to reconstruct and reliably extract information from these networks. With limited resources, the balance of coverage of multiple perturbations and multiple subjects in a single perturbation needs to be considered in the experimental design. Increasing the number of experiments, or the number of subjects in an experiment, is an expensive and time-consuming way to improve network reconstruction. Integrating multiple types of data from existing subjects might be more efficient. For example, it has recently been demonstrated that combining genotypic and gene expression data in a segregating population leads to improved network reconstruction, which in turn may lead to better predictions of the effects of experimental perturbations on any given gene. Here we simulate data based on networks reconstructed from biological data collected in a segregating mouse population and quantify the improvement in network reconstruction achieved using genotypic and gene expression data, compared with reconstruction using gene expression data alone. We demonstrate that networks reconstructed using the combined genotypic and gene expression data achieve a level of reconstruction accuracy that exceeds networks reconstructed from expression data alone, and that fewer subjects may be required to achieve this superior reconstruction accuracy. We conclude that this integrative genomics approach to reconstructing networks not only leads to more predictive network models, but also may save time and money by decreasing the amount of data that must be generated under any given condition of interest to construct predictive network models.
为剖析肥胖症和糖尿病等常见人类疾病,需要一种系统方法来研究基因如何相互作用,以及如何与遗传和环境因素相互作用,以确定临床终点或疾病表型。贝叶斯网络为从噪声数据中提取关系提供了一个便捷框架,并经常应用于大规模数据,以推导感兴趣变量之间的因果关系。鉴于常见人类疾病特征背后分子网络的复杂性,以及生物网络会根据环境条件和遗传因素而变化这一事实,通常需要大型数据集(一般涉及多次扰动(实验))来重建并可靠地从这些网络中提取信息。在资源有限的情况下,实验设计中需要考虑多次扰动的覆盖范围与单次扰动中多个受试者之间的平衡。增加实验次数或实验中受试者的数量是改善网络重建的一种昂贵且耗时的方法。整合来自现有受试者的多种类型数据可能更有效。例如,最近已证明,在一个分离群体中结合基因型和基因表达数据可改善网络重建,这反过来可能会更好地预测实验扰动对任何给定基因的影响。在此,我们基于从一个分离小鼠群体收集的生物数据重建的网络来模拟数据,并量化与仅使用基因表达数据进行重建相比,使用基因型和基因表达数据实现的网络重建改进情况。我们证明,使用组合的基因型和基因表达数据重建的网络达到的重建准确度水平超过仅从表达数据重建的网络,并且可能需要更少的受试者来实现这种更高的重建准确度。我们得出结论,这种用于重建网络的整合基因组学方法不仅能产生更具预测性的网络模型,还可能通过减少在任何给定感兴趣条件下构建预测性网络模型所需生成的数据量来节省时间和金钱。