Suppr超能文献

使用贝叶斯模型平均法的贝叶斯加法回归树

Bayesian Additive Regression Trees using Bayesian Model Averaging.

作者信息

Hernández Belinda, Raftery Adrian E, Pennington Stephen R, Parnell Andrew C

机构信息

School of Mathematics and Statistics, University College Dublin, Ireland.

Department of Statistics, University of Washington, USA.

出版信息

Stat Comput. 2018 Jul;28(4):869-890. doi: 10.1007/s11222-017-9767-1. Epub 2017 Jul 27.

Abstract

Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However for datasets where the number of variables is large the algorithm can become inefficient and computationally expensive. Another method which is popular for high dimensional data is random forests, a machine learning algorithm which grows trees using a greedy search for the best split points. However its default implementation does not produce probabilistic estimates or predictions. We propose an alternative fitting algorithm for BART called BART-BMA, which uses Bayesian Model Averaging and a greedy search algorithm to obtain a posterior distribution more efficiently than BART for datasets with large . BART-BMA incorporates elements of both BART and random forests to offer a model-based algorithm which can deal with high-dimensional data. We have found that BART-BMA can be run in a reasonable time on a standard laptop for the "small large " scenario which is common in many areas of bioinformatics. We showcase this method using simulated data and data from two real proteomic experiments, one to distinguish between patients with cardiovascular disease and controls and another to classify aggressive from non-aggressive prostate cancer. We compare our results to their main competitors. Open source code written in R and Rcpp to run BART-BMA can be found at: https://github.com/BelindaHernandez/BART-BMA.git.

摘要

贝叶斯加法回归树(BART)是一种树模型的统计总和。它可以被视为机器学习树集成方法的贝叶斯版本,其中单个树是基础学习器。然而,对于变量数量很大的数据集,该算法可能会变得效率低下且计算成本高昂。另一种在高维数据中流行的方法是随机森林,这是一种机器学习算法,它使用贪婪搜索最佳分割点来生长树。然而,其默认实现不会产生概率估计或预测。我们提出了一种用于BART的替代拟合算法,称为BART-BMA,它使用贝叶斯模型平均和贪婪搜索算法,对于具有大量变量的数据集,比BART更有效地获得后验分布。BART-BMA融合了BART和随机森林的元素,提供了一种基于模型的算法,可以处理高维数据。我们发现,对于生物信息学许多领域中常见的“小 大”场景,BART-BMA可以在标准笔记本电脑上在合理的时间内运行。我们使用模拟数据和来自两个真实蛋白质组学实验的数据展示了这种方法,一个实验用于区分心血管疾病患者和对照组,另一个实验用于将侵袭性前列腺癌与非侵袭性前列腺癌进行分类。我们将我们的结果与它们的主要竞争对手进行了比较。用于运行BART-BMA的用R和Rcpp编写的开源代码可在以下网址找到:https://github.com/BelindaHernandez/BART-BMA.git。

相似文献

1
Bayesian Additive Regression Trees using Bayesian Model Averaging.使用贝叶斯模型平均法的贝叶斯加法回归树
Stat Comput. 2018 Jul;28(4):869-890. doi: 10.1007/s11222-017-9767-1. Epub 2017 Jul 27.
7
Bayesian additive regression trees and the General BART model.贝叶斯加法回归树与通用BART模型。
Stat Med. 2019 Nov 10;38(25):5048-5069. doi: 10.1002/sim.8347. Epub 2019 Aug 28.

引用本文的文献

5
A framework for mutational signature analysis based on DNA shape parameters.基于 DNA 形状参数的突变特征分析框架。
PLoS One. 2022 Jan 11;17(1):e0262495. doi: 10.1371/journal.pone.0262495. eCollection 2022.

本文引用的文献

7
Big data: How do your data grow?大数据:你的数据是如何增长的?
Nature. 2008 Sep 4;455(7209):28-9. doi: 10.1038/455028a.
10
The Bayesian revolution in genetics.遗传学中的贝叶斯革命。
Nat Rev Genet. 2004 Apr;5(4):251-61. doi: 10.1038/nrg1318.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验