Suppr超能文献

CoVizu:严重急性呼吸综合征冠状病毒2(SARS-CoV-2)基因组全球多样性的快速分析与可视化

CoVizu: Rapid analysis and visualization of the global diversity of SARS-CoV-2 genomes.

作者信息

Ferreira Roux-Cil, Wong Emmanuel, Gugan Gopi, Wade Kaitlyn, Liu Molly, Baena Laura Muñoz, Chato Connor, Lu Bonnie, Olabode Abayomi S, Poon Art F Y

机构信息

Department of Pathology and Laboratory Medicine, Western University, London, ON, Canada.

Department of Microbiology and Immunology, Western University, London, ON, Canada.

出版信息

Virus Evol. 2021 Nov 8;7(2):veab092. doi: 10.1093/ve/veab092. eCollection 2021 Dec.

Abstract

Phylogenetics has played a pivotal role in the genomic epidemiology of severe acute respiratory syndrome coronavirus 2, such as tracking the emergence and global spread of variants and scientific communication. However, the rapid accumulation of genomic data from around the world-with over two million genomes currently available in the Global Initiative on Sharing All Influenza Data database-is testing the limits of standard phylogenetic methods. Here, we describe a new approach to rapidly analyze and visualize large numbers of SARS-CoV-2 genomes. Using Python, genomes are filtered for problematic sites, incomplete coverage, and excessive divergence from a strict molecular clock. All differences from the reference genome, including indels, are extracted using minimap2 and compactly stored as a set of features for each genome. For each Pango lineage (https://cov-lineages.org), we collapse genomes with identical features into 'variants', generate 100 bootstrap samples of the feature set union to generate weights, and compute the symmetric differences between the weighted feature sets for every pair of variants. The resulting distance matrices are used to generate neighbor-joining trees in RapidNJ that are converted into a majority-rule consensus tree for each lineage. Branches with support values below 50 per cent or mean lengths below 0.5 differences are collapsed, and tip labels on affected branches are mapped to internal nodes as directly sampled ancestral variants. Currently, we process about 2 million genomes in approximately 9 h on 52 cores. The resulting trees are visualized using the JavaScript framework D3.js as 'beadplots', in which variants are represented by horizontal line segments, annotated with beads representing samples by collection date. Variants are linked by vertical edges to represent branches in the consensus tree. These visualizations are published at https://filogeneti.ca/CoVizu. All source code was released under an MIT license at https://github.com/PoonLab/covizu.

摘要

系统发育学在严重急性呼吸综合征冠状病毒2的基因组流行病学中发挥了关键作用,例如追踪变异株的出现和全球传播以及进行科学交流。然而,来自世界各地的基因组数据迅速积累——目前全球共享所有流感数据倡议数据库中已有超过200万个基因组——这正在考验标准系统发育方法的极限。在此,我们描述了一种快速分析和可视化大量严重急性呼吸综合征冠状病毒2基因组的新方法。使用Python,对基因组进行过滤,以去除有问题的位点、覆盖不完整以及与严格分子钟的过度差异。使用minimap2提取与参考基因组的所有差异,包括插入缺失,并将其紧凑地存储为每个基因组的一组特征。对于每个Pango谱系(https://cov-lineages.org),我们将具有相同特征的基因组合并为“变异株”,生成特征集并集的100个自展样本以生成权重,并计算每对变异株的加权特征集之间的对称差异。所得的距离矩阵用于在RapidNJ中生成邻接树,并将其转换为每个谱系的多数规则共识树。支持值低于50%或平均长度低于0.5个差异的分支被合并,受影响分支上的末端标签作为直接采样的祖先变异株映射到内部节点。目前,我们在52个核心上大约9小时内处理约200万个基因组。所得的树使用JavaScript框架D3.js可视化为“珠状图”,其中变异株由水平线段表示,用代表按收集日期的样本的珠子进行注释。变异株通过垂直边连接以表示共识树中的分支。这些可视化结果发布在https://filogeneti.ca/CoVizu。所有源代码已根据麻省理工学院许可在https://github.com/PoonLab/covizu上发布。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c517/10131274/73304a270662/veab092f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验