Anyansi Christine, Straub Timothy J, Manson Abigail L, Earl Ashlee M, Abeel Thomas
Delft Bioinformatics Lab, Delft University of Technology, Delft, Netherlands.
Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States.
Front Microbiol. 2020 Aug 18;11:1925. doi: 10.3389/fmicb.2020.01925. eCollection 2020.
Metagenomic sequencing is a powerful tool for examining the diversity and complexity of microbial communities. Most widely used tools for taxonomic profiling of metagenomic sequence data allow for a species-level overview of the composition. However, individual strains within a species can differ greatly in key genotypic and phenotypic characteristics, such as drug resistance, virulence and growth rate. Therefore, the ability to resolve microbial communities down to the level of individual strains within a species is critical to interpreting metagenomic data for clinical and environmental applications, where identifying a particular strain, or tracking a particular strain across a set of samples, can help aid in clinical diagnosis and treatment, or in characterizing yet unstudied strains across novel environmental locations. Recently published approaches have begun to tackle the problem of resolving strains within a particular species in metagenomic samples. In this review, we present an overview of these new algorithms and their uses, including methods based on assembly reconstruction and methods operating with or without a reference database. While existing metagenomic analysis methods show reasonable performance at the species and higher taxonomic levels, identifying closely related strains within a species presents a bigger challenge, due to the diversity of databases, genetic relatedness, and goals when conducting these analyses. Selection of which metagenomic tool to employ for a specific application should be performed on a case-by case basis as these tools have strengths and weaknesses that affect their performance on specific tasks. A comprehensive benchmark across different use case scenarios is vital to validate performance of these tools on microbial samples. Because strain-level metagenomic analysis is still in its infancy, development of more fine-grained, high-resolution algorithms will continue to be in demand for the future.
宏基因组测序是用于研究微生物群落多样性和复杂性的强大工具。用于宏基因组序列数据分类分析的最广泛使用的工具能够提供群落组成的物种水平概述。然而,一个物种内的各个菌株在关键的基因型和表型特征上可能有很大差异,如耐药性、毒力和生长速率。因此,将微生物群落解析到物种内单个菌株水平的能力对于解释宏基因组数据在临床和环境应用中的意义至关重要,在这些应用中,识别特定菌株或在一组样本中追踪特定菌株有助于临床诊断和治疗,或用于表征新环境中尚未研究的菌株。最近发表的方法已开始着手解决宏基因组样本中特定物种内菌株解析的问题。在本综述中,我们概述了这些新算法及其用途,包括基于组装重建的方法以及有或没有参考数据库的操作方法。虽然现有的宏基因组分析方法在物种和更高分类水平上表现出合理的性能,但由于数据库的多样性、遗传相关性以及进行这些分析时的目标,识别物种内密切相关的菌株面临更大挑战。针对特定应用选择使用哪种宏基因组工具应逐案进行,因为这些工具都有优缺点,会影响它们在特定任务上的表现。跨不同用例场景的全面基准测试对于验证这些工具在微生物样本上的性能至关重要。由于菌株水平的宏基因组分析仍处于起步阶段,未来对更精细、高分辨率算法的需求将持续存在。