BERG Health, Framingham, Massachusetts, USA.
J Comput Biol. 2020 May;27(5):698-708. doi: 10.1089/cmb.2019.0210. Epub 2019 Sep 5.
Structural learning of Bayesian networks (BNs) from observational data has gained increasing applied use and attention from various scientific and industrial areas. The mathematical theory of BNs and their optimization is well developed. Although there are several open-source BN learners in the public domain, none of them are able to handle both small and large feature space data and recover network structures with acceptable accuracy. is a novel BN learning and simulation software from BERG. It was developed with the goal of learning BNs from "Big Data" in health care, often exceeding hundreds of thousands features when research is conducted in genomics or multi-omics. This article provides a comprehensive performance evaluation of and its comparison with the open-source BN learners. The study investigated synthetic datasets of discrete, continuous, and mixed data in small and large feature space, respectively. The results demonstrated that outperformed the publicly available algorithms in structure recovery precision in almost all of the evaluated settings, achieving the true positive rates of 0.9 and precision of 0.8. In addition, supports all data types, including continuous, discrete, and mixed variables. It is effectively parallelized on a distributed system and can work with datasets of thousands of features that are infeasible for any of the publicly available tools with a desired level of recovery accuracy.
从观测数据中学习贝叶斯网络 (BN) 的结构已在各个科学和工业领域得到了越来越多的应用和关注。BN 的数学理论及其优化已经得到了很好的发展。尽管在公共领域有几个开源的 BN 学习者,但它们都无法处理小和大特征空间的数据,并以可接受的精度恢复网络结构。 是 BERG 的一种新型 BN 学习和模拟软件。它的开发目标是从医疗保健领域的“大数据”中学习 BN,在进行基因组学或多组学研究时,其特征通常超过数十万。本文对 进行了全面的性能评估,并与开源 BN 学习者进行了比较。该研究分别在小和大特征空间中对离散、连续和混合数据的合成数据集进行了调查。结果表明, 在几乎所有评估设置中,在结构恢复精度方面都优于现有的公开算法,达到了 0.9 的真阳性率和 0.8 的精度。此外, 支持所有数据类型,包括连续、离散和混合变量。它可以在分布式系统上有效地并行化,并可以处理数千个特征的数据集,而任何现有的公开工具都无法在所需的恢复精度水平上处理这些数据集。