Department of Statistics, Florida State University, Tallahassee, Florida.
Department of Statistics and Data Sciences, University of Texas at Austin, Austin, Texas.
Stat Med. 2023 Feb 10;42(3):246-263. doi: 10.1002/sim.9613. Epub 2022 Nov 25.
This paper introduces a nonparametric regression approach for univariate and multivariate skewed responses using Bayesian additive regression trees (BART). Existing BART methods use ensembles of decision trees to model a mean function, and have become popular recently due to their high prediction accuracy and ease of use. The usual assumption of a univariate Gaussian error distribution, however, is restrictive in many biomedical applications. Motivated by an oral health study, we provide a useful extension of BART, the skewBART model, to address this problem. We then extend skewBART to allow for multivariate responses, with information shared across the decision trees associated with different responses within the same subject. The methodology accommodates within-subject association, and allows varying skewness parameters for the varying multivariate responses. We illustrate the benefits of our multivariate skewBART proposal over existing alternatives via simulation studies and application to the oral health dataset with bivariate highly skewed responses. Our methodology is implementable via the R package skewBART, available on GitHub.
本文提出了一种使用贝叶斯加法回归树(BART)对单变量和多变量偏态响应进行非参数回归的方法。现有的 BART 方法使用决策树的集合来对均值函数进行建模,由于其预测精度高和易于使用,因此最近变得非常流行。然而,单变量高斯误差分布的通常假设在许多生物医学应用中是有限制的。受一项口腔健康研究的启发,我们提供了 BART 的一个有用扩展,即 skewBART 模型,以解决这个问题。然后,我们将 skewBART 扩展到允许多变量响应,在同一主题内的不同响应中共享与不同决策树相关的信息。该方法适用于个体内关联,并允许对不同的多变量响应使用不同的偏度参数。我们通过模拟研究和对具有双变量高度偏态响应的口腔健康数据集的应用,说明了我们的多元偏态 skewBART 提议相对于现有替代方案的优势。我们的方法可以通过 R 包 skewBART 实现,该包可在 GitHub 上获得。