Tipton Laura, Cuenco Karen T, Huang Laurence, Greenblatt Ruth M, Kleerup Eric, Sciurba Frank, Duncan Steven R, Donahoe Michael P, Morris Alison, Ghedin Elodie
1Department of Computational & Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261 USA.
2Department of Biology, Center for Genomics & Systems Biology, New York University, New York, NY 10003 USA.
BioData Min. 2018 Jun 15;11:12. doi: 10.1186/s13040-018-0173-9. eCollection 2018.
Human microbiome studies in clinical settings generally focus on distinguishing the microbiota in health from that in disease at a specific point in time. However, microbiome samples may be associated with disease severity or continuous clinical health indicators that are often assessed at multiple time points. While the temporal data from clinical and microbiome samples may be informative, analysis of this type of data can be problematic for standard statistical methods.
To identify associations between microbiota and continuous clinical variables measured repeatedly in two studies of the respiratory tract, we adapted a statistical method, the lasso-penalized generalized linear mixed model (LassoGLMM). LassoGLMM can screen for associated clinical variables, incorporate repeated measures of individuals, and address the large number of species found in the microbiome. As is common in microbiome studies, when the number of variables is an order of magnitude larger than the number of samples LassoGLMM can be imperfect in its variable selection. We overcome this limitation by adding a pre-screening step to reduce the number of variables evaluated in the model. We assessed the use of this adapted two-stage LassoGLMM for its ability to determine which microbes are associated with continuous repeated clinical measures.We found associations (retaining a non-zero coefficient in the LassoGLMM) between 10 laboratory measurements and 43 bacterial genera in the oral microbiota, and between 2 cytokines and 3 bacterial genera in the lung. We compared our associations with those identified by the Wilcoxon test after dichotomizing our outcomes and identified a non-significant trend towards differential abundance between high and low outcomes. Our two-step LassoGLMM explained more of the variance seen in the outcome of interest than other variants of the LassoGLMM method.
We demonstrated a method that can account for the large number of genera detected in microbiome studies and repeated measures of clinical or longitudinal studies, allowing for the detection of strong associations between microbes and clinical measures. By incorporating the design strengths of repeated measurements and a prescreening step to aid variable selection, our two-step LassoGLMM will be a useful analytic method for investigating relationships between microbes and repeatedly measured continuous outcomes.
临床环境中的人类微生物组研究通常侧重于在特定时间点区分健康人群与疾病人群的微生物群。然而,微生物组样本可能与疾病严重程度或经常在多个时间点评估的连续临床健康指标相关。虽然来自临床和微生物组样本的时间数据可能具有参考价值,但对于标准统计方法而言,此类数据分析可能存在问题。
为了确定微生物群与两项呼吸道研究中反复测量的连续临床变量之间的关联,我们采用了一种统计方法,即套索惩罚广义线性混合模型(LassoGLMM)。LassoGLMM可以筛选相关临床变量,纳入个体的重复测量值,并处理微生物组中发现的大量物种。正如微生物组研究中常见的那样,当变量数量比样本数量大一个数量级时,LassoGLMM在变量选择方面可能并不完美。我们通过添加预筛选步骤来减少模型中评估的变量数量,从而克服了这一限制。我们评估了这种改进的两阶段LassoGLMM用于确定哪些微生物与连续重复临床测量相关的能力。我们发现口腔微生物群中的10项实验室测量值与43个细菌属之间存在关联(在LassoGLMM中保留非零系数),以及肺部的2种细胞因子与3个细菌属之间存在关联。在将我们的结果二分法后,我们将这些关联与通过Wilcoxon检验确定的关联进行了比较,并确定了高低结果之间差异丰度的非显著趋势。我们的两步LassoGLMM比LassoGLMM方法的其他变体解释了更多在感兴趣结果中观察到的方差。
我们展示了一种方法,该方法可以解释微生物组研究中检测到的大量属以及临床或纵向研究的重复测量值,从而能够检测微生物与临床测量之间的强关联。通过纳入重复测量的设计优势和有助于变量选择的预筛选步骤,我们的两步LassoGLMM将成为研究微生物与反复测量的连续结果之间关系的有用分析方法。