Tang Zheng-Zheng, Chen Guanhua, Alekseyenko Alexander V, Li Hongzhe
Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN 37203, USA.
Department of Public Health Sciences.
Bioinformatics. 2017 May 1;33(9):1278-1285. doi: 10.1093/bioinformatics/btw804.
: Association analysis of microbiome composition with disease-related outcomes provides invaluable knowledge towards understanding the roles of microbes in the underlying disease mechanisms. Proper analysis of sparse compositional microbiome data is challenging. Existing methods rely on strong assumptions on the data structure and fail to pinpoint the associated microbial communities.
: We develop a general framework to: (i) perform robust association tests for the microbial community that exhibits arbitrary inter-taxa dependencies; (ii) localize lineages on the taxonomic tree that are associated with covariates (e.g. disease status); and (iii) assess the overall association of the whole microbial community with the covariates. Unlike existing methods for microbiome association analysis, our framework does not make any distributional assumptions on the microbiome data; it allows for the adjustment of confounding variables and accommodates excessive zero observations; and it incorporates taxonomic information. We perform extensive simulation studies under a wide-range of scenarios to evaluate the new methods and demonstrate substantial power gain over existing methods. The advantages of the proposed framework are further demonstrated with real datasets from two microbiome studies. The relevant R package miLineage is publicly available.
: miLineage package, manual and tutorial are available at https://medschool.vanderbilt.edu/tang-lab/software/miLineage .
Supplementary data are available at Bioinformatics online.
微生物组组成与疾病相关结果的关联分析为理解微生物在潜在疾病机制中的作用提供了宝贵的知识。对稀疏的微生物组组成数据进行恰当分析具有挑战性。现有方法依赖于对数据结构的强假设,并且无法精准确定相关的微生物群落。
我们开发了一个通用框架,用于:(i)对表现出任意分类单元间依赖性的微生物群落进行稳健的关联测试;(ii)在分类树上定位与协变量(如疾病状态)相关的谱系;(iii)评估整个微生物群落与协变量的总体关联。与现有的微生物组关联分析方法不同,我们的框架不对微生物组数据做任何分布假设;它允许调整混杂变量并处理过多的零观测值;并且它纳入了分类信息。我们在广泛的场景下进行了大量模拟研究,以评估新方法,并证明其相对于现有方法有显著的功效提升。所提出框架的优势在两项微生物组研究的真实数据集上得到了进一步证明。相关的R包miLineage可公开获取。
miLineage包、手册和教程可在https://medschool.vanderbilt.edu/tang-lab/software/miLineage获取。
补充数据可在《生物信息学》在线获取。