Turakhia Yatish, Thornlow Bryan, Hinrichs Angie S, De Maio Nicola, Gozashti Landen, Lanfear Robert, Haussler David, Corbett-Detig Russell
bioRxiv. 2020 Sep 28:2020.09.26.314971. doi: 10.1101/2020.09.26.314971.
As the SARS-CoV-2 virus spreads through human populations, the unprecedented accumulation of viral genome sequences is ushering a new era of "genomic contact tracing" - that is, using viral genome sequences to trace local transmission dynamics. However, because the viral phylogeny is already so large - and will undoubtedly grow many fold - placing new sequences onto the tree has emerged as a barrier to real-time genomic contact tracing. Here, we resolve this challenge by building an efficient, tree-based data structure encoding the inferred evolutionary history of the virus. We demonstrate that our approach improves the speed of phylogenetic placement of new samples and data visualization by orders of magnitude, making it possible to complete the placements under real-time constraints. Our method also provides the key ingredient for maintaining a fully-updated reference phylogeny. We make these tools available to the research community through the UCSC SARS-CoV-2 Genome Browser to enable rapid cross-referencing of information in new virus sequences with an ever-expanding array of molecular and structural biology data. The methods described here will empower research and genomic contact tracing for laboratories worldwide.
USHER is available to users through the UCSC Genome Browser at https://genome.ucsc.edu/cgi-bin/hgPhyloPlace . The source code and detailed instructions on how to compile and run UShER are available from https://github.com/yatisht/usher .
随着严重急性呼吸综合征冠状病毒2(SARS-CoV-2)在人群中传播,病毒基因组序列前所未有的积累正开启一个“基因组接触追踪”的新时代,即利用病毒基因组序列追踪局部传播动态。然而,由于病毒系统发育树已经如此庞大,而且无疑还会增长许多倍,将新序列添加到树上已成为实时基因组接触追踪的一个障碍。在这里,我们通过构建一种高效的、基于树的数据结构来解决这一挑战,该数据结构编码了病毒的推断进化历史。我们证明,我们的方法将新样本的系统发育定位速度和数据可视化提升了几个数量级,使得在实时限制条件下完成定位成为可能。我们的方法还为维持一个完全更新的参考系统发育树提供了关键要素。我们通过加州大学圣克鲁兹分校(UCSC)的SARS-CoV-2基因组浏览器向研究界提供这些工具,以便能够将新病毒序列中的信息与不断扩展的分子和结构生物学数据进行快速交叉引用。这里描述的方法将为全球实验室的研究和基因组接触追踪提供支持。
用户可通过UCSC基因组浏览器(https://genome.ucsc.edu/cgi-bin/hgPhyloPlace )使用USHER。USHER的源代码以及关于如何编译和运行USHER的详细说明可从https://github.com/yatisht/usher 获取。