Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America.
The Broad Institute, Cambridge, Massachusetts, United States of America.
PLoS Comput Biol. 2021 Nov 16;17(11):e1009442. doi: 10.1371/journal.pcbi.1009442. eCollection 2021 Nov.
It is challenging to associate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to their quantitative properties. Microbiome multi-omics are typically noisy, sparse (zero-inflated), high-dimensional, extremely non-normal, and often in the form of count or compositional measurements. Here we introduce an optimized combination of novel and established methodology to assess multivariable association of microbial community features with complex metadata in population-scale observational studies. Our approach, MaAsLin 2 (Microbiome Multivariable Associations with Linear Models), uses generalized linear and mixed models to accommodate a wide variety of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as a variety of data types (e.g., counts and relative abundances) with or without covariates and repeated measurements. To construct this method, we conducted a large-scale evaluation of a broad range of scenarios under which straightforward identification of meta-omics associations can be challenging. These simulation studies reveal that MaAsLin 2's linear model preserves statistical power in the presence of repeated measures and multiple covariates, while accounting for the nuances of meta-omics features and controlling false discovery. We also applied MaAsLin 2 to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project which, in addition to reproducing established results, revealed a unique, integrated landscape of inflammatory bowel diseases (IBD) across multiple time points and omics profiles.
由于其定量特性,将人类健康结果、饮食、环境条件或其他元数据等特征与微生物群落测量结果相关联具有一定的挑战性。微生物组多组学通常是嘈杂的、稀疏的(零膨胀)、高维的、极非正态的,并且通常以计数或组成测量的形式出现。在这里,我们介绍了一种新颖和成熟方法的优化组合,用于评估人群规模观察性研究中微生物群落特征与复杂元数据的多变量关联。我们的方法 MaAsLin 2(微生物组多变量与线性模型的关联)使用广义线性和混合模型来适应各种现代流行病学研究,包括横断面和纵向设计,以及各种数据类型(例如计数和相对丰度),无论是否有协变量和重复测量。为了构建这种方法,我们对广泛的场景进行了大规模评估,在这些场景中,直接识别元组学关联具有挑战性。这些模拟研究表明,MaAsLin 2 的线性模型在存在重复测量和多个协变量的情况下保留了统计能力,同时考虑了元组学特征的细微差别,并控制了假发现。我们还将 MaAsLin 2 应用于来自整合人类微生物组(HMP2)项目的微生物多组学数据集,除了重现已建立的结果外,该数据集还揭示了多个时间点和组学特征的炎症性肠病(IBD)的独特、综合景观。