MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, London W2 1PG, UK.
European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Hinxton CB10 1SD, UK.
Philos Trans R Soc Lond B Biol Sci. 2022 Oct 10;377(1861):20210237. doi: 10.1098/rstb.2021.0237. Epub 2022 Aug 22.
In less than a decade, population genomics of microbes has progressed from the effort of sequencing dozens of strains to thousands, or even tens of thousands of strains in a single study. There are now hundreds of thousands of genomes available even for a single bacterial species, and the number of genomes is expected to continue to increase at an accelerated pace given the advances in sequencing technology and widespread genomic surveillance initiatives. This explosion of data calls for innovative methods to enable rapid exploration of the structure of a population based on different data modalities, such as multiple sequence alignments, assemblies and estimates of gene content across different genomes. Here, we present Mandrake, an efficient implementation of a dimensional reduction method tailored for the needs of large-scale population genomics. Mandrake is capable of visualizing population structure from millions of whole genomes, and we illustrate its usefulness with several datasets representing major pathogens. Our method is freely available both as an analysis pipeline (https://github.com/johnlees/mandrake) and as a browser-based interactive application (https://gtonkinhill.github.io/mandrake-web/). This article is part of a discussion meeting issue 'Genomic population structures of microbial pathogens'.
在不到十年的时间里,微生物群体基因组学的发展已经从对数十个菌株进行测序的努力,进步到了在单个研究中对数千个,甚至数万种菌株进行测序。即使对于单一的细菌物种,现在也有数十万个基因组可供使用,而且随着测序技术的进步和广泛的基因组监测计划的开展,基因组的数量预计将继续以更快的速度增长。数据的爆炸式增长要求采用创新的方法,以便能够根据不同的数据模态(例如,多序列比对、组装和不同基因组中基因含量的估计)快速探索群体的结构。在这里,我们提出了 Mandrake,这是一种针对大规模群体基因组学需求定制的降维方法的高效实现。Mandrake 能够从数百万个全基因组中可视化群体结构,我们用几个代表主要病原体的数据集来说明其有用性。我们的方法既可以作为分析管道(https://github.com/johnlees/mandrake),也可以作为基于浏览器的交互式应用程序(https://gtonkinhill.github.io/mandrake-web/)免费使用。本文是“微生物病原体的基因组群体结构”讨论会议议题的一部分。