Gogoshin Grigoriy, Boerwinkle Eric, Rodin Andrei S
1 Diabetes and Metabolism Research Institute , City of Hope, Duarte, California.
2 Human Genetics Center, School of Public Health, University of Texas Health Science Center , Houston, Texas.
J Comput Biol. 2017 Apr;24(4):340-356. doi: 10.1089/cmb.2016.0100. Epub 2016 Sep 28.
Bayesian network (BN) reconstruction is a prototypical systems biology data analysis approach that has been successfully used to reverse engineer and model networks reflecting different layers of biological organization (ranging from genetic to epigenetic to cellular pathway to metabolomic). It is especially relevant in the context of modern (ongoing and prospective) studies that generate heterogeneous high-throughput omics datasets. However, there are both theoretical and practical obstacles to the seamless application of BN modeling to such big data, including computational inefficiency of optimal BN structure search algorithms, ambiguity in data discretization, mixing data types, imputation and validation, and, in general, limited scalability in both reconstruction and visualization of BNs. To overcome these and other obstacles, we present BNOmics, an improved algorithm and software toolkit for inferring and analyzing BNs from omics datasets. BNOmics aims at comprehensive systems biology-type data exploration, including both generating new biological hypothesis and testing and validating the existing ones. Novel aspects of the algorithm center around increasing scalability and applicability to varying data types (with different explicit and implicit distributional assumptions) within the same analysis framework. An output and visualization interface to widely available graph-rendering software is also included. Three diverse applications are detailed. BNOmics was originally developed in the context of genetic epidemiology data and is being continuously optimized to keep pace with the ever-increasing inflow of available large-scale omics datasets. As such, the software scalability and usability on the less than exotic computer hardware are a priority, as well as the applicability of the algorithm and software to the heterogeneous datasets containing many data types-single-nucleotide polymorphisms and other genetic/epigenetic/transcriptome variables, metabolite levels, epidemiological variables, endpoints, and phenotypes, etc.
贝叶斯网络(BN)重建是一种典型的系统生物学数据分析方法,已成功用于逆向工程和建模反映不同生物组织层次(从基因到表观遗传,再到细胞途径和代谢组学)的网络。在生成异质高通量组学数据集的现代(正在进行和未来的)研究背景下,它尤其相关。然而,将BN建模无缝应用于此类大数据存在理论和实际障碍,包括最优BN结构搜索算法的计算效率低下、数据离散化的模糊性、数据类型混合、插补和验证,以及总体而言,BN重建和可视化的可扩展性有限。为了克服这些和其他障碍,我们提出了BNOmics,这是一种用于从组学数据集中推断和分析BN的改进算法和软件工具包。BNOmics旨在进行全面的系统生物学类型的数据探索,包括生成新的生物学假设以及测试和验证现有假设。该算法的新颖之处在于在同一分析框架内提高可扩展性以及对不同数据类型(具有不同的显式和隐式分布假设)的适用性。还包括与广泛使用的图形渲染软件的输出和可视化接口。详细介绍了三个不同的应用。BNOmics最初是在遗传流行病学数据的背景下开发的,并不断进行优化以跟上可用大规模组学数据集不断增加的流入量。因此,该软件在普通计算机硬件上的可扩展性和可用性是优先考虑的,以及该算法和软件对包含多种数据类型(单核苷酸多态性和其他遗传/表观遗传/转录组变量、代谢物水平、流行病学变量、终点和表型等)的异质数据集的适用性。