Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China.
Beijing Institute of Mathematical Sciences and Applications (Bimsa), Beijing 101408, China.
Genes (Basel). 2024 Jul 7;15(7):891. doi: 10.3390/genes15070891.
The highly variable SARS-CoV-2 virus responsible for the COVID-19 pandemic frequently undergoes mutations, leading to the emergence of new variants that present novel threats to public health. The determination of these variants often relies on manual definition based on local sequence characteristics, resulting in delays in their detection relative to their actual emergence. In this study, we propose an algorithm for the automatic identification of novel variants. By leveraging the optimal natural metric for viruses based on an alignment-free perspective to measure distances between sequences, we devise a hypothesis testing framework to determine whether a given viral sequence belongs to a novel variant. Our method demonstrates high accuracy, achieving nearly 100% precision in identifying new variants of SARS-CoV-2 and HIV-1 as well as in detecting novel genera in Orthocoronavirinae. This approach holds promise for timely surveillance and management of emerging viral threats in the field of public health.
导致 COVID-19 大流行的高度变异 SARS-CoV-2 病毒经常发生突变,导致新的变异株出现,对公共卫生构成新的威胁。这些变异株的确定通常依赖于基于本地序列特征的手动定义,导致它们的检测相对滞后于实际出现。在这项研究中,我们提出了一种自动识别新型变异株的算法。通过利用基于无比对视角的病毒最佳自然度量来衡量序列之间的距离,我们设计了一个假设检验框架来确定给定的病毒序列是否属于新型变异株。我们的方法表现出很高的准确性,在识别 SARS-CoV-2 和 HIV-1 的新变异株以及检测 Orthocoronavirinae 中的新属方面,准确率接近 100%。这种方法有望及时监测和管理公共卫生领域新兴的病毒威胁。