使用贝叶斯模型平均法的贝叶斯加法回归树

Bayesian Additive Regression Trees using Bayesian Model Averaging.

作者信息

Hernández Belinda, Raftery Adrian E, Pennington Stephen R, Parnell Andrew C

机构信息

School of Mathematics and Statistics, University College Dublin, Ireland.

Department of Statistics, University of Washington, USA.

出版信息

Stat Comput. 2018 Jul;28(4):869-890. doi: 10.1007/s11222-017-9767-1. Epub 2017 Jul 27.

DOI:10.1007/s11222-017-9767-1

PMID:30449953

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6238959/

Abstract

Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However for datasets where the number of variables is large the algorithm can become inefficient and computationally expensive. Another method which is popular for high dimensional data is random forests, a machine learning algorithm which grows trees using a greedy search for the best split points. However its default implementation does not produce probabilistic estimates or predictions. We propose an alternative fitting algorithm for BART called BART-BMA, which uses Bayesian Model Averaging and a greedy search algorithm to obtain a posterior distribution more efficiently than BART for datasets with large . BART-BMA incorporates elements of both BART and random forests to offer a model-based algorithm which can deal with high-dimensional data. We have found that BART-BMA can be run in a reasonable time on a standard laptop for the "small large " scenario which is common in many areas of bioinformatics. We showcase this method using simulated data and data from two real proteomic experiments, one to distinguish between patients with cardiovascular disease and controls and another to classify aggressive from non-aggressive prostate cancer. We compare our results to their main competitors. Open source code written in R and Rcpp to run BART-BMA can be found at: https://github.com/BelindaHernandez/BART-BMA.git.

摘要

贝叶斯加法回归树（BART）是一种树模型的统计总和。它可以被视为机器学习树集成方法的贝叶斯版本，其中单个树是基础学习器。然而，对于变量数量很大的数据集，该算法可能会变得效率低下且计算成本高昂。另一种在高维数据中流行的方法是随机森林，这是一种机器学习算法，它使用贪婪搜索最佳分割点来生长树。然而，其默认实现不会产生概率估计或预测。我们提出了一种用于BART的替代拟合算法，称为BART-BMA，它使用贝叶斯模型平均和贪婪搜索算法，对于具有大量变量的数据集，比BART更有效地获得后验分布。BART-BMA融合了BART和随机森林的元素，提供了一种基于模型的算法，可以处理高维数据。我们发现，对于生物信息学许多领域中常见的“小大”场景，BART-BMA可以在标准笔记本电脑上在合理的时间内运行。我们使用模拟数据和来自两个真实蛋白质组学实验的数据展示了这种方法，一个实验用于区分心血管疾病患者和对照组，另一个实验用于将侵袭性前列腺癌与非侵袭性前列腺癌进行分类。我们将我们的结果与它们的主要竞争对手进行了比较。用于运行BART-BMA的用R和Rcpp编写的开源代码可在以下网址找到：https://github.com/BelindaHernandez/BART-BMA.git。

相似文献

Bayesian Additive Regression Trees using Bayesian Model Averaging.使用贝叶斯模型平均法的贝叶斯加法回归树

Stat Comput. 2018 Jul;28(4):869-890. doi: 10.1007/s11222-017-9767-1. Epub 2017 Jul 27.

Genome-wide prediction using Bayesian additive regression trees.使用贝叶斯加法回归树进行全基因组预测。

Genet Sel Evol. 2016 Jun 10;48(1):42. doi: 10.1186/s12711-016-0219-8.

Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM Components.贝叶斯加法回归树在估计颗粒物（PM）成分日浓度中的应用。

Atmosphere (Basel). 2020 Nov;11(11). doi: 10.3390/atmos11111233. Epub 2020 Nov 16.

A flexible approach for variable selection in large-scale healthcare database studies with missing covariate and outcome data.一种用于大规模医疗保健数据库研究中变量选择的灵活方法，该研究存在协变量和结果数据缺失的情况。

BMC Med Res Methodol. 2022 May 4;22(1):132. doi: 10.1186/s12874-022-01608-7.

An Efficient and Effective Model to Handle Missing Data in Classification.一种用于分类中处理缺失数据的高效有效模型。

Biomed Res Int. 2020 Nov 25;2020:8810143. doi: 10.1155/2020/8810143. eCollection 2020.

Nonparametric failure time: Time-to-event machine learning with heteroskedastic Bayesian additive regression trees and low information omnibus Dirichlet process mixtures.非参数失效时间：具有异方差贝叶斯加性回归树和低信息总括 Dirichlet 过程混合的事件时间机器学习。

Biometrics. 2023 Dec;79(4):3023-3037. doi: 10.1111/biom.13857. Epub 2023 Apr 16.

Bayesian additive regression trees and the General BART model.贝叶斯加法回归树与通用BART模型。

Stat Med. 2019 Nov 10;38(25):5048-5069. doi: 10.1002/sim.8347. Epub 2019 Aug 28.

Bayesian Additive Regression Trees (BART) with covariate adjusted borrowing in subgroup analyses.贝叶斯加性回归树（BART）在亚组分析中具有协变量调整的借用。

J Biopharm Stat. 2022 Jul 4;32(4):613-626. doi: 10.1080/10543406.2022.2089160. Epub 2022 Jun 23.

Decision making and uncertainty quantification for individualized treatments using Bayesian Additive Regression Trees.基于贝叶斯加法回归树的个体化治疗的决策制定与不确定性量化。

Stat Methods Med Res. 2019 Apr;28(4):1079-1093. doi: 10.1177/0962280217746191. Epub 2017 Dec 18.

A SEMIPARAMETRIC MODELING APPROACH USING BAYESIAN ADDITIVE REGRESSION TREES WITH AN APPLICATION TO EVALUATE HETEROGENEOUS TREATMENT EFFECTS.一种使用贝叶斯加法回归树的半参数建模方法及其在评估异质治疗效果中的应用

Ann Appl Stat. 2019 Sep;13(3):1989-2010. doi: 10.1214/19-AOAS1266. Epub 2019 Oct 17.

引用本文的文献

Benefits of public awareness in mitigating cystic echinococcosis risk in Western China: A climate and socio-economic perspective.从气候和社会经济角度看提高公众意识对减轻中国西部囊性棘球蚴病风险的益处

PLoS Negl Trop Dis. 2025 Jul 9;19(7):e0013182. doi: 10.1371/journal.pntd.0013182. eCollection 2025 Jul.

Adaptive Use of Co-Data Through Empirical Bayes for Bayesian Additive Regression Trees.通过经验贝叶斯对协数据进行自适应使用以用于贝叶斯加法回归树

Stat Med. 2025 Feb 28;44(5):e70004. doi: 10.1002/sim.70004.

New pattern of individualized management of chronic diseases: focusing on inflammatory bowel diseases and looking to the future.慢性病个体化管理新模式：聚焦炎症性肠病并展望未来。

Front Med (Lausanne). 2023 May 10;10:1186143. doi: 10.3389/fmed.2023.1186143. eCollection 2023.

Do German economic research institutes publish efficient growth and inflation forecasts? A Bayesian analysis.德国经济研究机构发布的经济增长和通胀预测有效吗？一项贝叶斯分析。

J Appl Stat. 2019 Aug 8;47(4):698-723. doi: 10.1080/02664763.2019.1652253. eCollection 2020.

A framework for mutational signature analysis based on DNA shape parameters.基于 DNA 形状参数的突变特征分析框架。

PLoS One. 2022 Jan 11;17(1):e0262495. doi: 10.1371/journal.pone.0262495. eCollection 2022.

Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM Components.贝叶斯加法回归树在估计颗粒物（PM）成分日浓度中的应用。

Atmosphere (Basel). 2020 Nov;11(11). doi: 10.3390/atmos11111233. Epub 2020 Nov 16.

Borrowing from supplemental sources to estimate causal effects from a primary data source.从补充资料中借用数据来估计原始资料的因果效应。

Stat Med. 2021 Oct 30;40(24):5115-5130. doi: 10.1002/sim.9114. Epub 2021 Jun 22.

An Efficient and Effective Model to Handle Missing Data in Classification.一种用于分类中处理缺失数据的高效有效模型。

Biomed Res Int. 2020 Nov 25;2020:8810143. doi: 10.1155/2020/8810143. eCollection 2020.

Artificial Intelligence in Pharmacoepidemiology: A Systematic Review. Part 1-Overview of Knowledge Discovery Techniques in Artificial Intelligence.药物流行病学中的人工智能：系统评价。第1部分——人工智能中的知识发现技术概述。

Front Pharmacol. 2020 Jul 16;11:1028. doi: 10.3389/fphar.2020.01028. eCollection 2020.

Unraveling the habitat preferences of two closely related bumble bee species in Eastern Europe.解析东欧两种近缘熊蜂物种的栖息地偏好

Ecol Evol. 2020 Apr 15;10(11):4773-4790. doi: 10.1002/ece3.6232. eCollection 2020 Jun.

本文引用的文献

The huge Package for High-dimensional Undirected Graph Estimation in R.R语言中用于高维无向图估计的庞大软件包。

J Mach Learn Res. 2012 Apr;13:1059-1062.

Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife.随机森林的置信区间：刀切法和无穷小刀切法

J Mach Learn Res. 2014 Jan;15(1):1625-1651.

Introducing conformal prediction in predictive modeling. A transparent and flexible alternative to applicability domain determination.在预测建模中引入共形预测。一种用于适用性域确定的透明且灵活的替代方法。

J Chem Inf Model. 2014 Jun 23;54(6):1596-603. doi: 10.1021/ci5001168. Epub 2014 May 21.

Why have so few proteomic biomarkers "survived" validation? (Sample size and independent validation considerations).为什么如此之少的蛋白质组学生物标志物“通过”了验证？（样本量和独立验证考量）

Proteomics. 2014 Jul;14(13-14):1587-92. doi: 10.1002/pmic.201300377. Epub 2014 May 16.

Molecular classification of prostate cancer progression: foundation for marker-driven treatment of prostate cancer.前列腺癌进展的分子分类：前列腺癌标志物驱动治疗的基础。

Cancer Discov. 2013 Aug;3(8):849-61. doi: 10.1158/2159-8290.CD-12-0460. Epub 2013 Jun 28.

The behaviour of random forest permutation-based variable importance measures under predictor correlation.随机森林排列重要性度量在预测变量相关性下的行为。

BMC Bioinformatics. 2010 Feb 27;11:110. doi: 10.1186/1471-2105-11-110.

Big data: How do your data grow?大数据：你的数据是如何增长的？

Nature. 2008 Sep 4;455(7209):28-9. doi: 10.1038/455028a.

Bayesian methods in bioinformatics and computational systems biology.生物信息学与计算系统生物学中的贝叶斯方法。

Brief Bioinform. 2007 Mar;8(2):109-16. doi: 10.1093/bib/bbm007. Epub 2007 Apr 12.

Gene selection and classification of microarray data using random forest.使用随机森林进行微阵列数据的基因选择与分类

BMC Bioinformatics. 2006 Jan 6;7:3. doi: 10.1186/1471-2105-7-3.

The Bayesian revolution in genetics.遗传学中的贝叶斯革命。

Nat Rev Genet. 2004 Apr;5(4):251-61. doi: 10.1038/nrg1318.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验