Hu Xiaowen, Guan Siqin, He Yiliang, Yi Guohui, Yao Lei, Zhang Jiaming
Key Laboratory of Microbiology of Hainan, Institute of Tropical Bioscience and Biotechnology, Chinese Academy of Tropical Agricultural Sciences, Haikou, China.
Institute of South Subtropical Crops, Chinese Academy of Tropical Agricultural Sciences, Zhanjiang, China.
Bio Protoc. 2024 Mar 20;14(6):e4955. doi: 10.21769/BioProtoc.4955.
Estimating the time of most recent common ancestor (tMRCA) is important to trace the origin of pathogenic viruses. This analysis is based on the genetic diversity accumulated in a certain time period. There have been thousands of mutant sites occurring in the genomes of SARS-CoV-2 since the COVID-19 pandemic started; six highly linked mutation sites occurred early before the start of the pandemic and can be used to classify the genomes into three main haplotypes. Tracing the origin of those three haplotypes may help to understand the origin of SARS-CoV-2. In this article, we present a complete protocol for the classification of SARS-CoV-2 genomes and calculating tMRCA using Bayesian phylodynamic method. This protocol may also be used in the analysis of other viral genomes. Key features • Filtering and alignment of a massive number of viral genomes using custom scripts and ViralMSA. • Classification of genomes based on highly linked sites using custom scripts. • Phylodynamic analysis of viral genomes using Bayesian evolutionary analysis sampling trees (BEAST). • Visualization of posterior distribution of tMRCA using Tracer.v1.7.2. • Optimized for the SARS-CoV-2.
估计最近共同祖先时间(tMRCA)对于追踪致病性病毒的起源至关重要。该分析基于特定时间段内积累的遗传多样性。自新冠疫情开始以来,严重急性呼吸综合征冠状病毒2(SARS-CoV-2)的基因组中已出现数千个突变位点;在疫情开始前早期出现了六个高度连锁的突变位点,可用于将基因组分为三种主要单倍型。追踪这三种单倍型的起源可能有助于了解SARS-CoV-2的起源。在本文中,我们提出了一个完整的方案,用于SARS-CoV-2基因组的分类以及使用贝叶斯系统发育动力学方法计算tMRCA。该方案也可用于分析其他病毒基因组。关键特性 • 使用自定义脚本和ViralMSA对大量病毒基因组进行过滤和比对。 • 使用自定义脚本基于高度连锁位点对基因组进行分类。 • 使用贝叶斯进化分析采样树(BEAST)对病毒基因组进行系统发育动力学分析。 • 使用Tracer.v1.7.2对tMRCA的后验分布进行可视化。 • 针对SARS-CoV-2进行了优化。