Li Lun, Li Cuiping, Li Na, Zou Dong, Zhao Wenming, Luo Hong, Xue Yongbiao, Zhang Zhang, Bao Yiming, Song Shuhui
China National Center for Bioinformation, Beijing, 100101, China.
National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
Adv Sci (Weinh). 2024 Dec;11(45):e2405058. doi: 10.1002/advs.202405058. Epub 2024 Oct 14.
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has evolved many high-risk variants, resulting in repeated COVID-19 waves over the past years. Therefore, accurate early warning of high-risk variants is vital for epidemic prevention and control. However, detecting high-risk variants through experimental and epidemiological research is time-consuming and often lags behind the emergence and spread of these variants. In this study, HiRisk-Detector a machine learning algorithm based on haplotype network, is developed for computationally early detecting high-risk SARS-CoV-2 variants. Leveraging over 7.6 million high-quality and complete SARS-CoV-2 genomes and metadata, the effectiveness, robustness, and generalizability of HiRisk-Detector are validated. First, HiRisk-Detector is evaluated on actual empirical data, successfully detecting all 13 high-risk variants, preceding World Health Organization announcements by 27 days on average. Second, its robustness is tested by reducing sequencing intensity to one-fourth, noting only a minimal delay of 3.8 days, demonstrating its effectiveness. Third, HiRisk-Detector is applied to detect risks among SARS-CoV-2 Omicron variant sub-lineages, confirming its broad applicability and high ROC-AUC and PR-AUC performance. Overall, HiRisk-Detector features powerful capacity for early detection of high-risk variants, bearing great utility for any public emergency caused by infectious diseases or viruses.
严重急性呼吸综合征冠状病毒2(SARS-CoV-2)已经进化出许多高风险变异株,导致在过去几年中新冠疫情反复出现。因此,对高风险变异株进行准确的早期预警对于疫情防控至关重要。然而,通过实验和流行病学研究来检测高风险变异株既耗时,而且往往滞后于这些变异株的出现和传播。在本研究中,开发了基于单倍型网络的机器学习算法HiRisk-Detector,用于对高风险SARS-CoV-2变异株进行计算早期检测。利用超过760万个高质量和完整的SARS-CoV-2基因组及元数据,验证了HiRisk-Detector的有效性、稳健性和通用性。首先,在实际经验数据上对HiRisk-Detector进行评估,成功检测出所有13种高风险变异株,平均比世界卫生组织宣布提前27天。其次,通过将测序强度降低到四分之一来测试其稳健性,发现仅延迟3.8天,证明了其有效性。第三,将HiRisk-Detector应用于检测SARS-CoV-2奥密克戎变异株亚谱系中的风险,证实了其广泛的适用性以及较高的ROC-AUC和PR-AUC性能。总体而言,HiRisk-Detector具有强大的早期检测高风险变异株的能力,对于由传染病或病毒引起的任何公共紧急情况都具有很大的实用价值。