Department of Biostatistics, Mailman School of Public Health, Columbia University, New York.
Division of Epidemiology, School of Public Health, University of Minnesota, Minneapolis, Minnesota.
Biometrics. 2021 Mar;77(1):91-101. doi: 10.1111/biom.13272. Epub 2020 May 4.
Dimension reduction of high-dimensional microbiome data facilitates subsequent analysis such as regression and clustering. Most existing reduction methods cannot fully accommodate the special features of the data such as count-valued and excessive zero reads. We propose a zero-inflated Poisson factor analysis model in this paper. The model assumes that microbiome read counts follow zero-inflated Poisson distributions with library size as offset and Poisson rates negatively related to the inflated zero occurrences. The latent parameters of the model form a low-rank matrix consisting of interpretable loadings and low-dimensional scores that can be used for further analyses. We develop an efficient and robust expectation-maximization algorithm for parameter estimation. We demonstrate the efficacy of the proposed method using comprehensive simulation studies. The application to the Oral Infections, Glucose Intolerance, and Insulin Resistance Study provides valuable insights into the relation between subgingival microbiome and periodontal disease.
高维微生物组数据的降维有助于后续的分析,如回归和聚类。大多数现有的降维方法不能充分适应数据的特殊特征,如计数值和过多的零读数。我们在本文中提出了一个零膨胀泊松因子分析模型。该模型假设微生物组读数遵循零膨胀泊松分布,以库大小作为偏移量,泊松率与膨胀零出现次数负相关。模型的潜在参数形成一个低秩矩阵,由可解释的加载和低维分数组成,可用于进一步的分析。我们开发了一种高效而稳健的期望最大化算法来进行参数估计。我们使用全面的模拟研究证明了所提出方法的有效性。对口腔感染、葡萄糖耐量和胰岛素抵抗研究的应用为龈下微生物组与牙周病之间的关系提供了有价值的见解。