B Sohn Michael, Li Hongzhe
Department of Biostatistics and Epidemiology, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania 19104, U.S.A.
Biometrics. 2018 Jun;74(2):448-457. doi: 10.1111/biom.12775. Epub 2017 Oct 9.
Distance-based ordination methods, such as principal coordinates analysis (PCoA), are widely used in the analysis of microbiome data. However, these methods are prone to pose a potential risk of misinterpretation about the compositional difference in samples across different populations if there is a difference in dispersion effects. Accounting for high sparsity and overdispersion of microbiome data, we propose a GLM-based Ordination Method for Microbiome Samples (GOMMS) in this article. This method uses a zero-inflated quasi-Poisson (ZIQP) latent factor model. An EM algorithm based on the quasi-likelihood is developed to estimate parameters. It performs comparatively to the distance-based approach when dispersion effects are negligible and consistently better when dispersion effects are strong, where the distance-based approach sometimes yields undesirable results. The estimated latent factors from GOMMS can be used to associate the microbiome community with covariates or outcomes using the standard multivariate tests, which can be investigated in future confirmatory experiments. We illustrate the method in simulations and an analysis of microbiome samples from nasopharynx and oropharynx.
基于距离的排序方法,如主坐标分析(PCoA),在微生物组数据分析中被广泛使用。然而,如果存在离散效应差异,这些方法在解释不同人群样本的组成差异时容易产生潜在的误解风险。考虑到微生物组数据的高稀疏性和过度离散性,我们在本文中提出了一种基于广义线性模型的微生物组样本排序方法(GOMMS)。该方法使用零膨胀拟泊松(ZIQP)潜在因子模型。开发了一种基于拟似然的期望最大化(EM)算法来估计参数。当离散效应可忽略不计时,它与基于距离的方法表现相当,而当离散效应较强时,它始终表现更好,此时基于距离的方法有时会产生不理想的结果。从GOMMS估计的潜在因子可用于使用标准多变量检验将微生物组群落与协变量或结果相关联,这可在未来的验证性实验中进行研究。我们在模拟以及对来自鼻咽和口咽的微生物组样本的分析中说明了该方法。