Division of Biostatistics, Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, California, USA.
Population Neuroscience and Genetics Lab, University of California San Diego, La Jolla, California, USA.
Stat Med. 2022 Jun 15;41(13):2354-2374. doi: 10.1002/sim.9359. Epub 2022 Mar 10.
Semi-continuous data present challenges in both model fitting and interpretation. Parametric distributions may be inappropriate for extreme long right tails of the data. Mean effects of covariates, susceptible to extreme values, may fail to capture relevant information for most of the sample. We propose a two-component semi-parametric Bayesian mixture model, with the discrete component captured by a probability mass (typically at zero) and the continuous component of the density modeled by a mixture of B-spline densities that can be flexibly fit to any data distribution. The model includes random effects of subjects to allow for application to longitudinal data. We specify prior distributions on parameters and perform model inference using a Markov chain Monte Carlo (MCMC) Gibbs-sampling algorithm programmed in R. Statistical inference can be made for multiple quantiles of the covariate effects simultaneously providing a comprehensive view. Various MCMC sampling techniques are used to facilitate convergence. We demonstrate the performance and the interpretability of the model via simulations and analyses on the National Consortium on Alcohol and Neurodevelopment in Adolescence study (NCANDA) data on alcohol binge drinking.
半连续数据在模型拟合和解释方面都存在挑战。参数分布可能不适合数据的极端长右尾。协变量的均值效应容易受到极值的影响,可能无法为大部分样本捕捉到相关信息。我们提出了一种两成分半参数贝叶斯混合模型,离散成分由概率质量(通常为零)捕获,密度的连续成分由 B 样条密度的混合建模,可以灵活地拟合任何数据分布。该模型包括主体的随机效应,以允许应用于纵向数据。我们在参数上指定先验分布,并使用 R 中的马尔可夫链蒙特卡罗 (MCMC) Gibbs 抽样算法进行模型推断。可以同时对协变量效应的多个分位数进行统计推断,从而提供全面的视图。使用各种 MCMC 抽样技术来促进收敛。我们通过对青少年酒精神经发育国家联盟 (NCANDA) 酒精 binge 饮酒数据的模拟和分析来演示模型的性能和可解释性。