Dadi Temesgen Hailemariam, Renard Bernhard Y, Wieler Lothar H, Semmler Torsten, Reinert Knut
Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany; International Max Planck Research School for Computational Biology and Scientific Computing (IMPRS-CBSC), Berlin, Germany; Department of Veterinary Medicine, Freie Universität Berlin, Berlin, Germany.
Robert Koch Institute , Berlin , Germany.
PeerJ. 2017 Mar 28;5:e3138. doi: 10.7717/peerj.3138. eCollection 2017.
Identification and quantification of microorganisms is a significant step in studying the alpha and beta diversities within and between microbial communities respectively. Both identification and quantification of a given microbial community can be carried out using whole genome shotgun sequences with less bias than when using 16S-rDNA sequences. However, shared regions of DNA among reference genomes and taxonomic units pose a significant challenge in assigning reads correctly to their true origins. The existing microbial community profiling tools commonly deal with this problem by either preparing signature-based unique references or assigning an ambiguous read to its least common ancestor in a taxonomic tree. The former method is limited to making use of the reads which can be mapped to the curated regions, while the latter suffer from the lack of uniquely mapped reads at lower (more specific) taxonomic ranks. Moreover, even if the tools exhibited good performance in calling the organisms present in a sample, there is still room for improvement in determining the correct relative abundance of the organisms. We present a new method Species Level Identification of Microorganisms from Metagenomes (SLIMM) which addresses the above issues by using coverage information of reference genomes to remove unlikely genomes from the analysis and subsequently gain more uniquely mapped reads to assign at lower ranks of a taxonomic tree. SLIMM is based on a few, seemingly easy steps which when combined create a tool that outperforms state-of-the-art tools in run-time and memory usage while being on par or better in computing quantitative and qualitative information at species-level.
微生物的鉴定和定量分别是研究微生物群落内部和之间的α和β多样性的重要步骤。使用全基因组鸟枪法序列可以对给定的微生物群落进行鉴定和定量,与使用16S-rDNA序列相比,偏差更小。然而,参考基因组和分类单元之间的DNA共享区域在将 reads 正确分配到其真实来源方面构成了重大挑战。现有的微生物群落分析工具通常通过准备基于特征的唯一参考或在分类树中将模糊读取分配给其最不常见的祖先来处理这个问题。前一种方法仅限于利用可以映射到策展区的 reads,而后一种方法在较低(更具体)的分类级别上缺乏唯一映射的 reads。此外,即使这些工具在识别样本中存在的生物体方面表现良好,在确定生物体的正确相对丰度方面仍有改进空间。我们提出了一种新方法——宏基因组微生物物种水平鉴定(SLIMM),该方法通过使用参考基因组的覆盖信息从分析中去除不太可能的基因组,随后获得更多唯一映射的 reads 以在分类树的较低级别上进行分配,从而解决上述问题。SLIMM基于几个看似简单的步骤,这些步骤结合起来创建了一个工具,该工具在运行时和内存使用方面优于现有工具,同时在计算物种水平的定量和定性信息方面与现有工具相当或更好。