Suppr超能文献

一种新的贝叶斯网络结构学习算法及其与开源软件的综合性能评估

: A Novel Bayesian Network Structural Learning Algorithm and Its Comprehensive Performance Evaluation Against Open-Source Software.

机构信息

BERG Health, Framingham, Massachusetts, USA.

出版信息

J Comput Biol. 2020 May;27(5):698-708. doi: 10.1089/cmb.2019.0210. Epub 2019 Sep 5.

Abstract

Structural learning of Bayesian networks (BNs) from observational data has gained increasing applied use and attention from various scientific and industrial areas. The mathematical theory of BNs and their optimization is well developed. Although there are several open-source BN learners in the public domain, none of them are able to handle both small and large feature space data and recover network structures with acceptable accuracy. is a novel BN learning and simulation software from BERG. It was developed with the goal of learning BNs from "Big Data" in health care, often exceeding hundreds of thousands features when research is conducted in genomics or multi-omics. This article provides a comprehensive performance evaluation of and its comparison with the open-source BN learners. The study investigated synthetic datasets of discrete, continuous, and mixed data in small and large feature space, respectively. The results demonstrated that outperformed the publicly available algorithms in structure recovery precision in almost all of the evaluated settings, achieving the true positive rates of 0.9 and precision of 0.8. In addition, supports all data types, including continuous, discrete, and mixed variables. It is effectively parallelized on a distributed system and can work with datasets of thousands of features that are infeasible for any of the publicly available tools with a desired level of recovery accuracy.

摘要

从观测数据中学习贝叶斯网络 (BN) 的结构已在各个科学和工业领域得到了越来越多的应用和关注。BN 的数学理论及其优化已经得到了很好的发展。尽管在公共领域有几个开源的 BN 学习者,但它们都无法处理小和大特征空间的数据,并以可接受的精度恢复网络结构。 是 BERG 的一种新型 BN 学习和模拟软件。它的开发目标是从医疗保健领域的“大数据”中学习 BN,在进行基因组学或多组学研究时,其特征通常超过数十万。本文对 进行了全面的性能评估,并与开源 BN 学习者进行了比较。该研究分别在小和大特征空间中对离散、连续和混合数据的合成数据集进行了调查。结果表明, 在几乎所有评估设置中,在结构恢复精度方面都优于现有的公开算法,达到了 0.9 的真阳性率和 0.8 的精度。此外, 支持所有数据类型,包括连续、离散和混合变量。它可以在分布式系统上有效地并行化,并可以处理数千个特征的数据集,而任何现有的公开工具都无法在所需的恢复精度水平上处理这些数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac80/7232674/9f91a2e61549/cmb.2019.0210_figure1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验