Hahn Georg, Cho Michael H, Weiss Scott T, Silverman Edwin K, Lange Christoph
bioRxiv. 2020 Jun 30:2020.06.22.165936. doi: 10.1101/2020.06.22.165936.
Research efforts of the ongoing SARS-CoV-2 pandemic have focused on viral genome sequence analysis to understand how the virus spread across the globe. Here, we assess three recently identified SARS-CoV-2 genomes in Beijing from June 2020 and attempt to determine the origin of these genomes, made available in the GISAID database. The database contains fully or partially sequenced SARS-CoV-2 samples from laboratories around the world. Including the three new samples and excluding samples with missing annotations, we analyzed 7, 643 SARS-CoV-2 genomes. Using principal component analysis computed on a similarity matrix that compares all pairs of the SARS-CoV-2 nucleotide sequences at all loci simultaneously, using the Jaccard index, we find that the newly discovered virus genomes from Beijing are in a genetic cluster that consists mostly of cases from Europe and South(east) Asia. The sequences of the new cases are most related to virus genomes from a small number of cases from China (March 2020), cases from Europe (February to early May 2020), and cases from South(east) Asia (May to June 2020). These findings could suggest that the original cases of this genetic cluster originated from China in March 2020 and were re-introduced to China by transmissions from samples from South(east) Asia between April and June 2020.
针对当前新型冠状病毒肺炎疫情的研究工作主要集中在病毒基因组序列分析上,以了解病毒是如何在全球传播的。在此,我们评估了2020年6月在北京新发现的三个新型冠状病毒基因组,并试图确定这些基因组的来源,这些基因组已在全球共享流感数据倡议组织(GISAID)数据库中公布。该数据库包含来自世界各地实验室的新型冠状病毒全序列或部分序列样本。包括这三个新样本并排除注释缺失的样本后,我们分析了7643个新型冠状病毒基因组。使用基于相似性矩阵计算的主成分分析,该矩阵使用杰卡德指数同时比较所有位点上新型冠状病毒核苷酸序列的所有配对,我们发现北京新发现的病毒基因组处于一个遗传簇中,该遗传簇主要由来自欧洲和东南亚的病例组成。这些新病例的序列与来自中国少数病例(2020年3月)、欧洲病例(2020年2月至5月初)以及东南亚病例(2020年5月至6月)的病毒基因组关系最为密切。这些发现可能表明,这个遗传簇的原始病例于2020年3月起源于中国,并在2020年4月至6月期间通过东南亚样本的传播再次传入中国。