Department of Mathematics, Simon Fraser University, Burnaby, Canada.
Public Health Agency of Canada, National Microbiology Laboratory, Winnipeg, MB,, Canada.
BMC Genomics. 2022 Oct 19;23(1):710. doi: 10.1186/s12864-022-08936-4.
The COVID-19 pandemic remains a global public health concern. Advances in sequencing technologies has allowed for high numbers of SARS-CoV-2 whole genome sequence (WGS) data and rapid sharing of sequences through global repositories to enable almost real-time genomic analysis of the pathogen. WGS data has been used previously to group genetically similar viral pathogens to reveal evidence of transmission, including methods that identify distinct clusters on a phylogenetic tree. Identifying clusters of linked cases can aid in the regional surveillance and management of the disease. In this study, we present a novel method for producing stable genomic clusters of SARS-CoV-2 cases, cov2clusters, and compare the accuracy and stability of our approach to previous methods used for phylogenetic clustering using real-world SARS-CoV-2 sequence data obtained from British Columbia, Canada.
We found that cov2clusters produced more stable clusters than previously used phylogenetic clustering methods when adding sequence data through time, mimicking an increase in sequence data through the pandemic. Our method also showed high accuracy when predicting epidemiologically informed clusters from sequence data.
Our new approach allows for the identification of stable clusters of SARS-CoV-2 from WGS data. Producing high-resolution SARS-CoV-2 clusters from sequence data alone can a challenge and, where possible, both genomic and epidemiological data should be used in combination.
COVID-19 大流行仍然是全球公共卫生关注的问题。测序技术的进步使得能够对大量 SARS-CoV-2 全基因组序列 (WGS) 数据进行测序,并通过全球存储库快速共享序列,从而能够对病原体进行几乎实时的基因组分析。WGS 数据以前曾用于将遗传上相似的病毒病原体分组,以揭示传播证据,包括在系统发育树上识别不同簇的方法。确定相关病例的簇有助于对疾病进行区域监测和管理。在这项研究中,我们提出了一种产生 SARS-CoV-2 病例稳定基因组簇的新方法 cov2clusters,并比较了我们的方法与以前用于使用从加拿大不列颠哥伦比亚省获得的真实 SARS-CoV-2 序列数据进行系统发育聚类的方法的准确性和稳定性。
我们发现,当随着时间的推移添加序列数据时,cov2clusters 比以前使用的系统发育聚类方法产生更稳定的聚类,模拟了大流行期间序列数据的增加。当从序列数据预测流行病学信息丰富的聚类时,我们的方法也表现出很高的准确性。
我们的新方法允许从 WGS 数据中识别 SARS-CoV-2 的稳定聚类。仅从序列数据生成高分辨率 SARS-CoV-2 聚类可能是一个挑战,并且在可能的情况下,应将基因组和流行病学数据结合使用。