基于集成贝叶斯规则分类器的多因素疾病建模新方法。

A novel approach to modeling multifactorial diseases using Ensemble Bayesian Rule classifiers.

机构信息

School of Computing and Information, Intelligent Systems Program, University of Pittsburgh, 135 N Bellefield Ave, Pittsburgh, PA 15213, United States.

Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Suite 500, Pittsburgh, PA15206, United States.

出版信息

J Biomed Inform. 2020 Jul;107:103455. doi: 10.1016/j.jbi.2020.103455. Epub 2020 Jun 1.

DOI:10.1016/j.jbi.2020.103455

PMID:32497685

Abstract

Modeling factors influencing disease phenotypes, from biomarker profiling study datasets, is a critical task in biomedicine. Such datasets are typically generated from high-throughput 'omic' technologies, which help examine disease mechanisms at an unprecedented resolution. These datasets are challenging because they are high-dimensional. The disease mechanisms they study are also complex because many diseases are multifactorial, resulting from the collective activity of several factors, each with a small effect. Bayesian rule learning (BRL) is a rule model inferred from learning Bayesian networks from data, and has been shown to be effective in modeling high-dimensional datasets. However, BRL is not efficient at modeling multifactorial diseases since it suffers from data fragmentation during learning. In this paper, we overcome this limitation by implementing and evaluating three types of ensemble model combination strategies with BRL- uniform combination (UC; same as Bagging), Bayesian model averaging (BMA), and Bayesian model combination (BMC)- collectively called Ensemble Bayesian Rule Learning (EBRL). We also introduce a novel method to visualize EBRL models, called the Bayesian Rule Ensemble Visualizing tool (BREVity), which helps extract interpret the most important rule patterns guiding the predictions made by the ensemble model. Our results using twenty-five public, high-dimensional, gene expression datasets of multifactorial diseases, suggest that, both EBRL models using UC and BMC achieve better predictive performance than BMA and other classic machine learning methods. Furthermore, BMC is found to be more reliable than UC, when the ensemble includes sub-optimal models resulting from the stochasticity of the model search process. Together, EBRL and BREVity provides researchers a promising and novel tool for modeling multifactorial diseases from high-dimensional datasets that leverages strengths of ensemble methods for predictive performance, while also providing interpretable explanations for its predictions.

摘要

从生物标志物分析研究数据集建模影响疾病表型的因素，是生物医学领域的一项关键任务。这些数据集通常是由高通量“组学”技术生成的，这些技术有助于以前所未有的分辨率检查疾病机制。这些数据集具有挑战性，因为它们是高维的。它们所研究的疾病机制也很复杂，因为许多疾病都是多因素的，是由几个因素的共同活动引起的，每个因素的影响都很小。贝叶斯规则学习（BRL）是一种从数据中学习贝叶斯网络推断出的规则模型，已被证明在建模高维数据集方面非常有效。然而，BRL 在建模多因素疾病方面效率不高，因为它在学习过程中会受到数据碎片化的影响。在本文中，我们通过实现和评估三种类型的集成模型组合策略来克服这一限制，这些策略与 BRL 一起使用——均匀组合（UC；与 Bagging 相同）、贝叶斯模型平均（BMA）和贝叶斯模型组合（BMC）——统称为集成贝叶斯规则学习（EBRL）。我们还引入了一种新的方法来可视化 EBRL 模型，称为贝叶斯规则集可视化工具（BREVity），它有助于提取和解释指导集成模型预测的最重要规则模式。我们使用 25 个公共的、高维的、多因素疾病的基因表达数据集的结果表明，使用 UC 和 BMC 的 EBRL 模型都比 BMA 和其他经典机器学习方法具有更好的预测性能。此外，当集成包括由于模型搜索过程的随机性而导致的次优模型时，发现 BMC 比 UC 更可靠。总之，EBRL 和 BREVity 为研究人员提供了一种有前途的新工具，用于从高维数据集中建模多因素疾病，该工具利用了集成方法在预测性能方面的优势，同时为其预测提供了可解释的解释。

相似文献

A novel approach to modeling multifactorial diseases using Ensemble Bayesian Rule classifiers.基于集成贝叶斯规则分类器的多因素疾病建模新方法。

J Biomed Inform. 2020 Jul;107:103455. doi: 10.1016/j.jbi.2020.103455. Epub 2020 Jun 1.

Bayesian rule learning for biomedical data mining.贝叶斯规则学习在生物医学数据挖掘中的应用。

Bioinformatics. 2010 Mar 1;26(5):668-75. doi: 10.1093/bioinformatics/btq005. Epub 2010 Jan 14.

Ant colony optimization algorithm for interpretable Bayesian classifiers combination: application to medical predictions.用于可解释贝叶斯分类器组合的蚁群优化算法：在医学预测中的应用

PLoS One. 2014 Feb 3;9(2):e86456. doi: 10.1371/journal.pone.0086456. eCollection 2014.

Tunable structure priors for Bayesian rule learning for knowledge integrated biomarker discovery.用于知识整合生物标志物发现的贝叶斯规则学习的可调结构先验。

World J Clin Oncol. 2018 Sep 14;9(5):98-109. doi: 10.5306/wjco.v9.i5.98.

Bayesian machine learning ensemble approach to quantify model uncertainty in predicting groundwater storage change.贝叶斯机器学习集成方法用于量化预测地下水储量变化中模型不确定性。

Sci Total Environ. 2021 May 15;769:144715. doi: 10.1016/j.scitotenv.2020.144715. Epub 2021 Jan 20.

A novel method for predicting kidney stone type using ensemble learning.一种使用集成学习预测肾结石类型的新方法。

Artif Intell Med. 2018 Jan;84:117-126. doi: 10.1016/j.artmed.2017.12.001. Epub 2017 Dec 11.

Learning Parsimonious Classification Rules from Gene Expression Data Using Bayesian Networks with Local Structure.使用具有局部结构的贝叶斯网络从基因表达数据中学习简洁分类规则。

Data (Basel). 2017 Mar;2(1). doi: 10.3390/data2010005. Epub 2017 Jan 18.

IntelliHealth: A medical decision support application using a novel weighted multi-layer classifier ensemble framework.智能健康：一种使用新型加权多层分类器集成框架的医疗决策支持应用程序。

J Biomed Inform. 2016 Feb;59:185-200. doi: 10.1016/j.jbi.2015.12.001. Epub 2015 Dec 15.

A Machine Learning Ensemble Classifier for Early Prediction of Diabetic Retinopathy.机器学习集成分类器在糖尿病视网膜病变早期预测中的应用。

J Med Syst. 2017 Nov 9;41(12):201. doi: 10.1007/s10916-017-0853-x.

Selective model averaging with bayesian rule learning for predictive biomedicine.用于预测性生物医学的贝叶斯规则学习选择性模型平均法。

AMIA Jt Summits Transl Sci Proc. 2014 Apr 7;2014:17-22. eCollection 2014.

引用本文的文献

Tracking Health, Performance and Recovery in Athletes Using Machine Learning.利用机器学习跟踪运动员的健康、表现和恢复情况。

Sports (Basel). 2022 Oct 19;10(10):160. doi: 10.3390/sports10100160.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于集成贝叶斯规则分类器的多因素疾病建模新方法。

A novel approach to modeling multifactorial diseases using Ensemble Bayesian Rule classifiers.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献