Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA.
Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
Nat Genet. 2021 Jun;53(6):809-816. doi: 10.1038/s41588-021-00862-7. Epub 2021 May 10.
As the SARS-CoV-2 virus spreads through human populations, the unprecedented accumulation of viral genome sequences is ushering in a new era of 'genomic contact tracing'-that is, using viral genomes to trace local transmission dynamics. However, because the viral phylogeny is already so large-and will undoubtedly grow many fold-placing new sequences onto the tree has emerged as a barrier to real-time genomic contact tracing. Here, we resolve this challenge by building an efficient tree-based data structure encoding the inferred evolutionary history of the virus. We demonstrate that our approach greatly improves the speed of phylogenetic placement of new samples and data visualization, making it possible to complete the placements under the constraints of real-time contact tracing. Thus, our method addresses an important need for maintaining a fully updated reference phylogeny. We make these tools available to the research community through the University of California Santa Cruz SARS-CoV-2 Genome Browser to enable rapid cross-referencing of information in new virus sequences with an ever-expanding array of molecular and structural biology data. The methods described here will empower research and genomic contact tracing for SARS-CoV-2 specifically for laboratories worldwide.
随着 SARS-CoV-2 病毒在人群中传播,病毒基因组序列的空前积累正在开创一个“基因组接触追踪”的新时代——即用病毒基因组来追踪局部传播动态。然而,由于病毒的系统发育树已经非常庞大——而且无疑会呈指数级增长——将新序列添加到树上已经成为实时基因组接触追踪的一个障碍。在这里,我们通过构建一个有效的基于树的数据结构来解决这个挑战,该数据结构对病毒的推断进化史进行编码。我们证明,我们的方法大大提高了新样本的系统发育位置和数据可视化的速度,使得在实时接触追踪的限制下完成位置成为可能。因此,我们的方法满足了实时更新参考系统发育树的重要需求。我们通过加州大学圣克鲁兹分校 SARS-CoV-2 基因组浏览器向研究界提供这些工具,使新病毒序列中的信息能够与越来越多的分子和结构生物学数据进行快速交叉引用。这里描述的方法将为 SARS-CoV-2 的研究和基因组接触追踪提供支持,特别是为全球各地的实验室提供支持。