Ma Siyuan, Li Hongzhe
Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA.
Methods Mol Biol. 2023;2629:231-245. doi: 10.1007/978-1-0716-2986-4_11.
Microbial strains are interpreted as a lineage derived from a recent ancestor that have not experienced "too many" recombination events and can be successfully retrieved with culture-independent techniques using metagenomic sequencing. Such a strain variability has been increasingly shown to display additional phenotypic heterogeneities that affect host health, such as virulence, transmissibility, and antibiotics resistance. New statistical and computational methods have recently been developed to track the strains in samples based on shotgun metagenomics data either based on reference genome sequences or Metagenome-assembled genomes (MAGs). In this paper, we review some recent statistical methods for strain identifications based on frequency counts at a set of single nucleotide variants (SNVs) within a set of single-copy marker genes. These methods differ in terms of whether reference genome sequences are needed, how SNVs are called, what methods of deconvolution are used and whether the methods can be applied to multiple samples. We conclude our review with areas that require further research.
微生物菌株被解释为源自最近祖先的一个谱系,该谱系未经历“过多”的重组事件,并且可以使用宏基因组测序的非培养技术成功获取。越来越多的研究表明,这种菌株变异性会表现出影响宿主健康的其他表型异质性,如毒力、传播性和抗生素耐药性。最近已经开发出了新的统计和计算方法,用于基于鸟枪法宏基因组数据,根据参考基因组序列或宏基因组组装基因组(MAG)来追踪样本中的菌株。在本文中,我们综述了一些基于一组单拷贝标记基因内的单核苷酸变异(SNV)频率计数进行菌株鉴定的最新统计方法。这些方法在是否需要参考基因组序列、如何调用SNV、使用何种反卷积方法以及这些方法是否可应用于多个样本等方面存在差异。我们在综述结尾指出了需要进一步研究的领域。