Suppr超能文献

基于集成贝叶斯规则分类器的多因素疾病建模新方法。

A novel approach to modeling multifactorial diseases using Ensemble Bayesian Rule classifiers.

机构信息

School of Computing and Information, Intelligent Systems Program, University of Pittsburgh, 135 N Bellefield Ave, Pittsburgh, PA 15213, United States.

Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Suite 500, Pittsburgh, PA15206, United States.

出版信息

J Biomed Inform. 2020 Jul;107:103455. doi: 10.1016/j.jbi.2020.103455. Epub 2020 Jun 1.

Abstract

Modeling factors influencing disease phenotypes, from biomarker profiling study datasets, is a critical task in biomedicine. Such datasets are typically generated from high-throughput 'omic' technologies, which help examine disease mechanisms at an unprecedented resolution. These datasets are challenging because they are high-dimensional. The disease mechanisms they study are also complex because many diseases are multifactorial, resulting from the collective activity of several factors, each with a small effect. Bayesian rule learning (BRL) is a rule model inferred from learning Bayesian networks from data, and has been shown to be effective in modeling high-dimensional datasets. However, BRL is not efficient at modeling multifactorial diseases since it suffers from data fragmentation during learning. In this paper, we overcome this limitation by implementing and evaluating three types of ensemble model combination strategies with BRL- uniform combination (UC; same as Bagging), Bayesian model averaging (BMA), and Bayesian model combination (BMC)- collectively called Ensemble Bayesian Rule Learning (EBRL). We also introduce a novel method to visualize EBRL models, called the Bayesian Rule Ensemble Visualizing tool (BREVity), which helps extract interpret the most important rule patterns guiding the predictions made by the ensemble model. Our results using twenty-five public, high-dimensional, gene expression datasets of multifactorial diseases, suggest that, both EBRL models using UC and BMC achieve better predictive performance than BMA and other classic machine learning methods. Furthermore, BMC is found to be more reliable than UC, when the ensemble includes sub-optimal models resulting from the stochasticity of the model search process. Together, EBRL and BREVity provides researchers a promising and novel tool for modeling multifactorial diseases from high-dimensional datasets that leverages strengths of ensemble methods for predictive performance, while also providing interpretable explanations for its predictions.

摘要

从生物标志物分析研究数据集建模影响疾病表型的因素,是生物医学领域的一项关键任务。这些数据集通常是由高通量“组学”技术生成的,这些技术有助于以前所未有的分辨率检查疾病机制。这些数据集具有挑战性,因为它们是高维的。它们所研究的疾病机制也很复杂,因为许多疾病都是多因素的,是由几个因素的共同活动引起的,每个因素的影响都很小。贝叶斯规则学习(BRL)是一种从数据中学习贝叶斯网络推断出的规则模型,已被证明在建模高维数据集方面非常有效。然而,BRL 在建模多因素疾病方面效率不高,因为它在学习过程中会受到数据碎片化的影响。在本文中,我们通过实现和评估三种类型的集成模型组合策略来克服这一限制,这些策略与 BRL 一起使用——均匀组合(UC;与 Bagging 相同)、贝叶斯模型平均(BMA)和贝叶斯模型组合(BMC)——统称为集成贝叶斯规则学习(EBRL)。我们还引入了一种新的方法来可视化 EBRL 模型,称为贝叶斯规则集可视化工具(BREVity),它有助于提取和解释指导集成模型预测的最重要规则模式。我们使用 25 个公共的、高维的、多因素疾病的基因表达数据集的结果表明,使用 UC 和 BMC 的 EBRL 模型都比 BMA 和其他经典机器学习方法具有更好的预测性能。此外,当集成包括由于模型搜索过程的随机性而导致的次优模型时,发现 BMC 比 UC 更可靠。总之,EBRL 和 BREVity 为研究人员提供了一种有前途的新工具,用于从高维数据集中建模多因素疾病,该工具利用了集成方法在预测性能方面的优势,同时为其预测提供了可解释的解释。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验