Tang Zheng-Zheng, Chen Guanhua, Alekseyenko Alexander V
Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN 37203, USA.
Biomedical Informatics Center, Department of Public Health Sciences and Department of Oral Health Sciences, Medical University of South Carolina, Charleston, SC 29403, USA.
Bioinformatics. 2016 Sep 1;32(17):2618-25. doi: 10.1093/bioinformatics/btw311. Epub 2016 May 19.
Recent advances in sequencing technology have made it possible to obtain high-throughput data on the composition of microbial communities and to study the effects of dysbiosis on the human host. Analysis of pairwise intersample distances quantifies the association between the microbiome diversity and covariates of interest (e.g. environmental factors, clinical outcomes, treatment groups). In the design of these analyses, multiple choices for distance metrics are available. Most distance-based methods, however, use a single distance and are underpowered if the distance is poorly chosen. In addition, distance-based tests cannot flexibly handle confounding variables, which can result in excessive false-positive findings.
We derive presence-weighted UniFrac to complement the existing UniFrac distances for more powerful detection of the variation in species richness. We develop PERMANOVA-S, a new distance-based method that tests the association of microbiome composition with any covariates of interest. PERMANOVA-S improves the commonly-used Permutation Multivariate Analysis of Variance (PERMANOVA) test by allowing flexible confounder adjustments and ensembling multiple distances. We conducted extensive simulation studies to evaluate the performance of different distances under various patterns of association. Our simulation studies demonstrate that the power of the test relies on how well the selected distance captures the nature of the association. The PERMANOVA-S unified test combines multiple distances and achieves good power regardless of the patterns of the underlying association. We demonstrate the usefulness of our approach by reanalyzing several real microbiome datasets.
miProfile software is freely available at https://medschool.vanderbilt.edu/tang-lab/software/miProfile
z.tang@vanderbilt.edu or g.chen@vanderbilt.edu
Supplementary data are available at Bioinformatics online.
测序技术的最新进展使得获取微生物群落组成的高通量数据以及研究菌群失调对人类宿主的影响成为可能。成对样本间距离分析量化了微生物组多样性与感兴趣的协变量(如环境因素、临床结果、治疗组)之间的关联。在这些分析的设计中,距离度量有多种选择。然而,大多数基于距离的方法使用单一距离,如果距离选择不当,功效会不足。此外,基于距离的检验不能灵活处理混杂变量,这可能导致过多的假阳性结果。
我们推导了存在加权的UniFrac距离,以补充现有的UniFrac距离,从而更有力地检测物种丰富度的变化。我们开发了PERMANOVA-S,这是一种新的基于距离的方法,用于检验微生物组组成与任何感兴趣的协变量之间的关联。PERMANOVA-S通过允许灵活的混杂因素调整和整合多个距离,改进了常用的置换多变量方差分析(PERMANOVA)检验。我们进行了广泛的模拟研究,以评估不同距离在各种关联模式下的性能。我们的模拟研究表明,检验的功效取决于所选距离对关联本质的捕捉程度。PERMANOVA-S统一检验结合了多个距离,无论潜在关联模式如何,都能实现良好的功效。我们通过重新分析几个真实的微生物组数据集,证明了我们方法的实用性。
miProfile软件可在https://medschool.vanderbilt.edu/tang-lab/software/miProfile免费获取。
z.tang@vanderbilt.edu或g.chen@vanderbilt.edu
补充数据可在《生物信息学》在线获取。