Suppr超能文献

通过期望最大化算法分析成分和亚成分微生物组数据的贝叶斯广义线性模型

Bayesian Generalized Linear Models for Analyzing Compositional and Sub-Compositional Microbiome Data via EM Algorithm.

作者信息

Zhang Li, Ding Zhenying, Cui Jinhong, Zhou Xiaoxiao, Yi Nengjun

机构信息

Biostatistics and Bioinformatics Facility, Fox Chase Cancer Center, Philadelphia, Pennsylvania, USA.

Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama, USA.

出版信息

Stat Med. 2025 Mar 30;44(7):e70084. doi: 10.1002/sim.70084.

Abstract

The study of compositional microbiome data is critical for exploring the functional roles of microbial communities in human health and disease. Recent advances have shifted from traditional log-ratio transformations of compositional covariates to zero constraint on the sum of the corresponding coefficients. Various approaches, including penalized regression and Markov Chain Monte Carlo (MCMC) algorithms, have been extended to enforce this sum-to-zero constraint. However, these methods exhibit limitations: penalized regression yields only point estimates, limiting uncertainty assessment, while MCMC methods, although reliable, are computationally intensive, particularly in high-dimensional data settings. To address the challenges posed by existing methods, we proposed Bayesian generalized linear models for analyzing compositional and sub-compositional microbiome data. Our model employs a spike-and-slab double-exponential prior on the microbiome coefficients, inducing weak shrinkage on large coefficients and strong shrinkage on irrelevant ones, making it ideal for high-dimensional microbiome data. The sum-to-zero constraint is handled through soft-centers by applying prior distribution on the sum of compositional or subcompositional coefficients. To alleviate computational intensity, we have developed a fast and stable algorithm incorporating expectation-maximization (EM) steps into the routine iteratively weighted least squares (IWLS) algorithm for fitting GLMs. The performance of the proposed method was assessed by extensive simulation studies. The simulation results show that our approach outperforms existing methods with higher accuracy of coefficient estimates and lower prediction error. We also applied the proposed method to one microbiome study to find microorganisms linked to inflammatory bowel disease (IBD). The methods have been implemented in a freely available R package BhGLM https://github.com/nyiuab/BhGLM.

摘要

对微生物群落组成数据的研究对于探索微生物群落在人类健康和疾病中的功能作用至关重要。最近的进展已从传统的成分协变量对数比变换转向对相应系数之和的零约束。包括惩罚回归和马尔可夫链蒙特卡罗(MCMC)算法在内的各种方法已得到扩展,以强制执行这种和为零的约束。然而,这些方法存在局限性:惩罚回归仅产生点估计,限制了不确定性评估,而MCMC方法虽然可靠,但计算量很大,特别是在高维数据设置中。为了应对现有方法带来的挑战,我们提出了用于分析组成和子组成微生物群落数据的贝叶斯广义线性模型。我们的模型在微生物群落系数上采用了尖峰和平板双指数先验,对大系数产生弱收缩,对无关系数产生强收缩,使其适用于高维微生物群落数据。通过对组成或子组成系数之和应用先验分布,通过软中心处理和为零的约束。为了减轻计算强度,我们开发了一种快速稳定的算法,将期望最大化(EM)步骤纳入用于拟合广义线性模型的常规迭代加权最小二乘(IWLS)算法中。通过广泛的模拟研究评估了所提出方法的性能。模拟结果表明,我们的方法在系数估计精度更高和预测误差更低方面优于现有方法。我们还将所提出的方法应用于一项微生物群落研究,以寻找与炎症性肠病(IBD)相关的微生物。这些方法已在一个免费可用的R包BhGLM(https://github.com/nyiuab/BhGLM)中实现。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验