Datta Jyotishka, Bandyopadhyay Dipankar
Department of Statistics, Virginia Polytechnic Institute and State University, 250 Drillfield Drive, Blacksburg, VA 24061 USA.
Department of Biostatistics, School of Population Health, Virginia Commonwealth University, One Capital Square, 7th Floor, 830 East Main Street, PO Box 980032, Richmond, VA 23298-0032 USA.
J Indian Soc Probab Stat. 2024;25(2):491-515. doi: 10.1007/s41096-024-00194-9. Epub 2024 May 29.
Microbiome studies generate multivariate compositional responses, such as taxa counts, which are strictly non-negative, bounded, residing within a simplex, and subject to unit-sum constraint. In presence of covariates (which can be moderate to high dimensional), they are popularly modeled via the Dirichlet-Multinomial (D-M) regression framework. In this paper, we consider a Bayesian approach for estimation and inference under a D-M compositional framework, and present a comparative evaluation of some state-of-the-art continuous shrinkage priors for efficient variable selection to identify the most significant associations between available covariates, and taxonomic abundance. Specifically, we compare the performances of the horseshoe and horseshoe+ priors (with the benchmark Bayesian lasso), utilizing Hamiltonian Monte Carlo techniques for posterior sampling, and generating posterior credible intervals. Our simulation studies using synthetic data demonstrate excellent recovery and estimation accuracy of sparse parameter regime by the continuous shrinkage priors. We further illustrate our method via application to a motivating oral microbiome data generated from the NYC-Hanes study. RStan implementation of our method is made available at the GitHub link: (https://github.com/dattahub/compshrink).
微生物组研究产生多变量组成反应,如分类单元计数,这些反应严格非负、有界、位于单纯形内且受单位和约束。在存在协变量(可以是中度到高维)的情况下,它们通常通过狄利克雷 - 多项分布(D - M)回归框架进行建模。在本文中,我们考虑在D - M组成框架下进行估计和推断的贝叶斯方法,并对一些用于有效变量选择的最新连续收缩先验进行比较评估,以识别可用协变量与分类丰度之间最显著的关联。具体而言,我们比较了马蹄形和马蹄形 + 先验(以基准贝叶斯套索为对照)的性能,利用哈密顿蒙特卡罗技术进行后验采样并生成后验可信区间。我们使用合成数据进行的模拟研究表明,连续收缩先验在稀疏参数情况下具有出色的恢复和估计准确性。我们通过应用于从纽约市 - 汉尼斯研究生成的具有启发性的口腔微生物组数据进一步说明了我们的方法。我们方法的RStan实现可在GitHub链接获取:(https://github.com/dattahub/compshrink)