Department of Biostatistics, University of Alabama at Birmingham, Alabama, USA.
School of Data Science and Analytics, Kennesaw State University, Kennesaw, Georgia, USA.
Stat Med. 2024 Jan 15;43(1):141-155. doi: 10.1002/sim.9946. Epub 2023 Nov 20.
The crucial impact of the microbiome on human health and disease has gained significant scientific attention. Researchers seek to connect microbiome features with health conditions, aiming to predict diseases and develop personalized medicine strategies. However, the practicality of conventional models is restricted due to important aspects of microbiome data. Specifically, the data observed is compositional, as the counts within each sample are bound by a fixed-sum constraint. Moreover, microbiome data often exhibits high dimensionality, wherein the number of variables surpasses the available samples. In addition, microbiome features exhibiting phenotypical similarity usually have similar influence on the response variable. To address the challenges posed by these aspects of the data structure, we proposed Bayesian compositional generalized linear models for analyzing microbiome data (BCGLM) with a structured regularized horseshoe prior for the compositional coefficients and a soft sum-to-zero restriction on coefficients through the prior distribution. We fitted the proposed models using Markov Chain Monte Carlo (MCMC) algorithms with R package rstan. The performance of the proposed method was assessed by extensive simulation studies. The simulation results show that our approach outperforms existing methods with higher accuracy of coefficient estimates and lower prediction error. We also applied the proposed method to microbiome study to find microorganisms linked to inflammatory bowel disease (IBD). To make this work reproducible, the code and data used in this article are available at https://github.com/Li-Zhang28/BCGLM.
微生物组对人类健康和疾病的关键影响引起了科学界的广泛关注。研究人员试图将微生物组特征与健康状况联系起来,旨在预测疾病并开发个性化医疗策略。然而,由于微生物组数据的重要方面,传统模型的实用性受到限制。具体来说,所观察到的数据是组合性的,因为每个样本中的计数受到固定和约束。此外,微生物组数据通常具有高维度,其中变量的数量超过了可用的样本数量。此外,表现出表型相似性的微生物组特征通常对响应变量有相似的影响。为了解决数据结构中这些方面带来的挑战,我们提出了用于分析微生物组数据的贝叶斯组合广义线性模型 (BCGLM),该模型对组合系数采用结构化正则化马蹄铁先验,并且通过先验分布对系数施加软和为零的限制。我们使用 R 包 rstan 中的 Markov Chain Monte Carlo (MCMC) 算法拟合了所提出的模型。通过广泛的模拟研究评估了所提出方法的性能。模拟结果表明,我们的方法在系数估计的准确性和预测误差方面优于现有方法。我们还将所提出的方法应用于微生物组研究,以寻找与炎症性肠病 (IBD) 相关的微生物。为了使这项工作具有可重复性,本文中使用的代码和数据可在 https://github.com/Li-Zhang28/BCGLM 上获得。