School of Life Sciences and Department of Statistics, University of Warwick, Coventry CV4 7AL, UK.
Department of Veterinary Medicine, University of Cambridge, Cambridge CB3 0ES, UK.
Philos Trans R Soc Lond B Biol Sci. 2022 Oct 10;377(1861):20210246. doi: 10.1098/rstb.2021.0246. Epub 2022 Aug 22.
Recent years have seen a remarkable increase in the practicality of sequencing whole genomes from large numbers of bacterial isolates. The availability of this data has huge potential to deliver new insights into the evolution and epidemiology of bacterial pathogens, but the scalability of the analytical methodology has been lagging behind that of the sequencing technology. Here we present a step-by-step approach for such large-scale genomic epidemiology analyses, from bacterial genomes to epidemiological interpretations. A central component of this approach is the dated phylogeny, which is a phylogenetic tree with branch lengths measured in units of time. The construction of dated phylogenies from bacterial genomic data needs to account for the disruptive effect of recombination on phylogenetic relationships, and we describe how this can be achieved. Dated phylogenies can then be used to perform fine-scale or large-scale epidemiological analyses, depending on the proportion of cases for which genomes are available. A key feature of this approach is computational scalability and in particular the ability to process hundreds or thousands of genomes within a matter of hours. This is a clear advantage of the step-by-step approach described here. We discuss other advantages and disadvantages of the approach, as well as potential improvements and avenues for future research. This article is part of a discussion meeting issue 'Genomic population structures of microbial pathogens'.
近年来,从大量细菌分离株中对全基因组进行测序的实用性显著提高。这些数据具有很大的潜力,可以深入了解细菌病原体的进化和流行病学,但分析方法的可扩展性一直落后于测序技术。在这里,我们提出了一种从细菌基因组到流行病学解释的大规模基因组流行病学分析的分步方法。该方法的一个核心组成部分是时间系统发育,这是一种带有时间单位测量的分支长度的系统发育树。从细菌基因组数据构建时间系统发育需要考虑重组对系统发育关系的破坏影响,我们描述了如何实现这一点。然后,时间系统发育可以用于进行精细或大规模的流行病学分析,具体取决于可用基因组的病例比例。该方法的一个关键特征是计算可扩展性,特别是能够在数小时内处理数百或数千个基因组。这是这里描述的分步方法的明显优势。我们讨论了该方法的其他优缺点,以及潜在的改进和未来研究的方向。本文是关于“微生物病原体的基因组种群结构”讨论会议议题的一部分。