Randolph Timothy W, Zhao Sen, Copeland Wade, Hullar Meredith, Shojaie Ali
Fred Hutchinson Cancer Research Center.
University of Washington.
Ann Appl Stat. 2018 Mar;12(1):540-566. doi: 10.1214/17-AOAS1102. Epub 2018 Mar 9.
The analysis of human microbiome data is often based on dimension-reduced graphical displays and clusterings derived from vectors of microbial abundances in each sample. Common to these ordination methods is the use of biologically motivated definitions of similarity. Principal coordinate analysis, in particular, is often performed using ecologically defined distances, allowing analyses to incorporate context-dependent, non-Euclidean structure. In this paper, we go beyond dimension-reduced ordination methods and describe a framework of high-dimensional regression models that extends these distance-based methods. In particular, we use kernel-based methods to show how to incorporate a variety of extrinsic information, such as phylogeny, into penalized regression models that estimate taxonspecific associations with a phenotype or clinical outcome. Further, we show how this regression framework can be used to address the compositional nature of multivariate predictors comprised of relative abundances; that is, vectors whose entries sum to a constant. We illustrate this approach with several simulations using data from two recent studies on gut and vaginal microbiomes. We conclude with an application to our own data, where we also incorporate a significance test for the estimated coefficients that represent associations between microbial abundance and a percent fat.
人类微生物组数据的分析通常基于降维图形显示和从每个样本中微生物丰度向量得出的聚类。这些排序方法的共同之处在于使用基于生物学动机的相似性定义。特别是主坐标分析,通常使用生态学定义的距离来进行,从而使分析能够纳入依赖于上下文的非欧几里得结构。在本文中,我们超越了降维排序方法,描述了一个高维回归模型框架,该框架扩展了这些基于距离的方法。具体而言,我们使用基于核的方法来展示如何将各种外部信息(如系统发育)纳入惩罚回归模型,这些模型估计与表型或临床结果的分类群特异性关联。此外,我们展示了这个回归框架如何用于处理由相对丰度组成的多元预测变量的组成性质;也就是说,其元素之和为常数的向量。我们使用来自最近两项关于肠道和阴道微生物组研究的数据进行了几次模拟来说明这种方法。我们以应用于我们自己的数据作为结尾,在那里我们还对代表微生物丰度与脂肪百分比之间关联的估计系数进行了显著性检验。