Department of Computer Science, University of Central Florida, Orlando, Florida, United States of America.
Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, Florida, United States of America.
PLoS One. 2018 Jun 14;13(6):e0198773. doi: 10.1371/journal.pone.0198773. eCollection 2018.
Dozens of computational methods are developed to identify species present in a metagenomic dataset. Many of these computational methods depend on available sequenced microbial species, which are still far from being representative. To see how newly sequenced genomes affect the analysis results, we re-analyzed a shotgun metagenomic dataset composed of twelve colitis free metagenomic samples and ten colitis-related metagenomic samples. Unexpectedly, we identified at least two new phyla that may relate to colitis development in patients, together with the phylum identified previously. Compared with the previously identified phylum that differed between the two types of samples, the differences associated with the two new phyla are statistically more significant. Moreover, the abundance of the two new phyla correlates more with the severity of colitis. Surprisingly, even by repeating the analyses implemented in the previous study, we found that at least one main conclusion in the previous study is not supported. Our study indicates the importance of re-analysis of the generated metagenomic datasets and the necessity of considering multiple updated tools in metagenomic studies. It also sheds light on the limitations of the popular tools used currently and the importance to infer the presence of taxa without relying upon available sequenced genomes.
数十种计算方法被开发出来以识别宏基因组数据集中存在的物种。其中许多计算方法依赖于现有的已测序微生物物种,但这些物种仍然远远不够具有代表性。为了了解新测序的基因组如何影响分析结果,我们重新分析了由 12 个结肠炎无关的宏基因组样本和 10 个结肠炎相关的宏基因组样本组成的 shotgun 宏基因组数据集。出乎意料的是,我们鉴定出至少两个与患者结肠炎发展相关的新门,以及以前鉴定出的门。与两种类型样本之间先前鉴定的门相比,与两个新门相关的差异在统计学上更为显著。此外,这两个新门的丰度与结肠炎的严重程度更相关。令人惊讶的是,即使重复以前研究中实施的分析,我们发现以前研究中的至少一个主要结论没有得到支持。我们的研究表明重新分析生成的宏基因组数据集的重要性,以及在宏基因组研究中考虑多个更新工具的必要性。它还揭示了当前使用的流行工具的局限性,以及在不依赖现有测序基因组的情况下推断分类单元存在的重要性。