The Francis Crick Institute, London, United Kingdom.
Elife. 2022 Nov 15;11:e82392. doi: 10.7554/eLife.82392.
The COVID-19 pandemic has resulted in a step change in the scale of sequencing data, with more genomes of SARS-CoV-2 having been sequenced than any other organism on earth. These sequences reveal key insights when represented as a phylogenetic tree, which captures the evolutionary history of the virus, and allows the identification of transmission events and the emergence of new variants. However, existing web-based tools for exploring phylogenies do not scale to the size of datasets now available for SARS-CoV-2. We have developed Taxonium, a new tool that uses WebGL to allow the exploration of trees with tens of millions of nodes in the browser for the first time. Taxonium links each node to associated metadata and supports mutation-annotated trees, which are able to capture all known genetic variation in a dataset. It can either be run entirely locally in the browser, from a server-based backend, or as a desktop application. We describe insights that analysing a tree of five million sequences can provide into SARS-CoV-2 evolution, and provide a tool at cov2tree.org for exploring a public tree of more than five million SARS-CoV-2 sequences. Taxonium can be applied to any tree, and is available at taxonium.org, with source code at github.com/theosanderson/taxonium.
新冠疫情大流行导致测序数据的规模发生了重大变化,已测序的 SARS-CoV-2 基因组数量超过了地球上任何其他生物。这些序列以系统发育树的形式呈现时,揭示了关键的见解,它捕获了病毒的进化历史,并允许识别传播事件和新变体的出现。然而,现有的用于探索系统发育树的基于网络的工具无法扩展到现在可用于 SARS-CoV-2 的数据集的规模。我们开发了 Taxonium,这是一种新工具,它使用 WebGL 首次允许在浏览器中探索具有数千万个节点的树。Taxonium 将每个节点链接到相关的元数据,并支持带突变注释的树,这些树能够捕获数据集中所有已知的遗传变异。它可以在浏览器中完全在本地运行,也可以从基于服务器的后端运行,也可以作为桌面应用程序运行。我们描述了分析五百万个序列的树可以提供的 SARS-CoV-2 进化的见解,并在 cov2tree.org 上提供了一个用于探索超过五百万个 SARS-CoV-2 序列的公共树的工具。Taxonium 可应用于任何树,网址是 taxonium.org,源代码在 github.com/theosanderson/taxonium。