基于 Spike 蛋白突变的无监督机器学习分析的变异驱动早期预警用于 COVID-19。

Variant-driven early warning via unsupervised machine learning analysis of spike protein mutations for COVID-19.

机构信息

Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy.

Scuola Superiore Meridionale, Largo S. Marcellino 10, 80138, Naples, Italy.

出版信息

Sci Rep. 2022 Jun 3;12(1):9275. doi: 10.1038/s41598-022-12442-8.

DOI:10.1038/s41598-022-12442-8

PMID:35661750

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9166699/

Abstract

Never before such a vast amount of data, including genome sequencing, has been collected for any viral pandemic than for the current case of COVID-19. This offers the possibility to trace the virus evolution and to assess the role mutations play in its spread within the population, in real time. To this end, we focused on the Spike protein for its central role in mediating viral outbreak and replication in host cells. Employing the Levenshtein distance on the Spike protein sequences, we designed a machine learning algorithm yielding a temporal clustering of the available dataset. From this, we were able to identify and define emerging persistent variants that are in agreement with known evidences. Our novel algorithm allowed us to define persistent variants as chains that remain stable over time and to highlight emerging variants of epidemiological interest as branching events that occur over time. Hence, we determined the relationship and temporal connection between variants of interest and the ensuing passage to dominance of the current variants of concern. Remarkably, the analysis and the relevant tools introduced in our work serve as an early warning for the emergence of new persistent variants once the associated cluster reaches 1% of the time-binned sequence data. We validated our approach and its effectiveness on the onset of the Alpha variant of concern. We further predict that the recently identified lineage AY.4.2 ('Delta plus') is causing a new emerging variant. Comparing our findings with the epidemiological data we demonstrated that each new wave is dominated by a new emerging variant, thus confirming the hypothesis of the existence of a strong correlation between the birth of variants and the pandemic multi-wave temporal pattern. The above allows us to introduce the epidemiology of variants that we described via the Mutation epidemiological Renormalisation Group framework.

摘要

从未有过如此大量的数据，包括基因组测序，被用于任何病毒大流行，比目前的 COVID-19 病例。这提供了追踪病毒进化的可能性，并实时评估突变在其在人群中的传播中的作用。为此，我们专注于刺突蛋白，因为它在介导病毒爆发和在宿主细胞中复制方面起着核心作用。我们利用 Levenshtein 距离对 Spike 蛋白序列进行操作，设计了一种机器学习算法，对可用数据集进行时间聚类。由此，我们能够识别和定义新兴的持续变异，这些变异与已知的证据一致。我们的新算法允许我们将持续变异定义为随着时间的推移保持稳定的链，并突出随时间发生的具有流行病学意义的新兴变异作为分支事件。因此，我们确定了感兴趣的变体之间的关系和时间联系，以及当前变体的后续主导地位。值得注意的是，一旦相关聚类达到时间分箱序列数据的 1%，我们在工作中引入的分析和相关工具就可以作为新的持续变异出现的早期预警。我们验证了我们的方法及其在关注的 Alpha 变体出现时的有效性。我们进一步预测，最近发现的 AY.4.2 谱系（“Delta plus”）正在引发新的新兴变体。通过将我们的发现与流行病学数据进行比较，我们证明了每一波新的浪潮都由一个新的新兴变体主导，从而证实了变体的出现与大流行多波时间模式之间存在强相关的假设。这使我们能够通过突变流行病学重整化群框架来介绍我们描述的变体的流行病学。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d823/9166699/1fe26d2732b2/41598_2022_12442_Fig1_HTML.jpg

相似文献

Variant-driven early warning via unsupervised machine learning analysis of spike protein mutations for COVID-19.

Sci Rep. 2022 Jun 3;12(1):9275. doi: 10.1038/s41598-022-12442-8.

Tracking SARS-CoV-2 Spike Protein Mutations in the United States (January 2020-March 2021) Using a Statistical Learning Strategy.

Viruses. 2021 Dec 21;14(1):9. doi: 10.3390/v14010009.

Emergency SARS-CoV-2 Variants of Concern: Novel Multiplex Real-Time RT-PCR Assay for Rapid Detection and Surveillance.

Microbiol Spectr. 2022 Feb 23;10(1):e0251321. doi: 10.1128/spectrum.02513-21.

Haplotype distribution of SARS-CoV-2 variants in low and high vaccination rate countries during ongoing global COVID-19 pandemic in early 2021.

Infect Genet Evol. 2022 Jan;97:105164. doi: 10.1016/j.meegid.2021.105164. Epub 2021 Nov 27.

Developing an Amplification Refractory Mutation System-Quantitative Reverse Transcription-PCR Assay for Rapid and Sensitive Screening of SARS-CoV-2 Variants of Concern.

Microbiol Spectr. 2022 Feb 23;10(1):e0143821. doi: 10.1128/spectrum.01438-21. Epub 2022 Jan 5.

Evolutionary and Phenotypic Characterization of Two Spike Mutations in European Lineage 20E of SARS-CoV-2.

mBio. 2021 Dec 21;12(6):e0231521. doi: 10.1128/mBio.02315-21. Epub 2021 Nov 16.

Initial introduction and spread of the SARS-CoV-2 AY.4.2.1 Delta variant in Bulgaria, a genomic insight.

J Med Virol. 2022 Dec;94(12):6060-6064. doi: 10.1002/jmv.28033. Epub 2022 Aug 10.

Transformations, Lineage Comparisons, and Analysis of Down-to-Up Protomer States of Variants of the SARS-CoV-2 Prefusion Spike Protein, Including the UK Variant B.1.1.7.

Microbiol Spectr. 2021 Sep 3;9(1):e0003021. doi: 10.1128/Spectrum.00030-21. Epub 2021 Aug 4.

Molecular and Serological Characterization of the SARS-CoV-2 Delta Variant in Bangladesh in 2021.

Viruses. 2021 Nov 19;13(11):2310. doi: 10.3390/v13112310.

Emerging SARS-CoV-2 variants can potentially break set epidemiological barriers in COVID-19.

J Med Virol. 2022 Apr;94(4):1300-1314. doi: 10.1002/jmv.27467. Epub 2021 Nov 29.

引用本文的文献

Paying attention to the SARS-CoV-2 dialect : a deep neural network approach to predicting novel protein mutations.

Commun Biol. 2025 Jan 21;8(1):98. doi: 10.1038/s42003-024-07262-7.

Using intrahost single nucleotide variant data to predict SARS-CoV-2 detection cycle threshold values.

PLoS One. 2024 Oct 30;19(10):e0312686. doi: 10.1371/journal.pone.0312686. eCollection 2024.

Data-driven recombination detection in viral genomes.

Nat Commun. 2024 Apr 17;15(1):3313. doi: 10.1038/s41467-024-47464-5.

Redefining pandemic preparedness: Multidisciplinary insights from the CERP modelling workshop in infectious diseases, workshop report.

Infect Dis Model. 2024 Feb 23;9(2):501-518. doi: 10.1016/j.idm.2024.02.008. eCollection 2024 Jun.

Maritime transportation and people mobility in the early diffusion of COVID-19 in Croatia.

Front Public Health. 2023 Aug 17;11:1183047. doi: 10.3389/fpubh.2023.1183047. eCollection 2023.

VariantHunter: a method and tool for fast detection of emerging SARS-CoV-2 variants.

Database (Oxford). 2023 Jul 6;2023. doi: 10.1093/database/baad044.

Using Haplotype-Based Artificial Intelligence to Evaluate SARS-CoV-2 Novel Variants and Mutations.

JAMA Netw Open. 2023 Feb 1;6(2):e230191. doi: 10.1001/jamanetworkopen.2023.0191.

SARS-CoV-2 Omicron Variant Genomic Sequences and Their Epidemiological Correlates Regarding the End of the Pandemic: In Silico Analysis.

JMIR Bioinform Biotechnol. 2023 Jan 10;4:e42700. doi: 10.2196/42700. eCollection 2023.

Genomic Epidemiology of the SARS-CoV-2 Epidemic in Cyprus from November 2020 to October 2021: The Passage of Waves of Alpha and Delta Variants of Concern.

Viruses. 2022 Dec 30;15(1):108. doi: 10.3390/v15010108.

Early detection of variants of concern via funnel plots of regional reproduction numbers.

Sci Rep. 2023 Jan 19;13(1):1052. doi: 10.1038/s41598-022-27116-8.

本文引用的文献

Effective mathematical modelling of health passes during a pandemic.

Sci Rep. 2022 Apr 28;12(1):6989. doi: 10.1038/s41598-022-10663-5.

Epidemiological theory of virus variants.

Physica A. 2022 Jun 15;596:127071. doi: 10.1016/j.physa.2022.127071. Epub 2022 Feb 16.

Understanding the effectiveness of government interventions against the resurgence of COVID-19 in Europe.

Nat Commun. 2021 Oct 5;12(1):5820. doi: 10.1038/s41467-021-26013-4.

SARS-CoV-2 Variants of Interest and Concern naming scheme conducive for global discourse.

Nat Microbiol. 2021 Jul;6(7):821-823. doi: 10.1038/s41564-021-00932-w.

Impact of US vaccination strategy on COVID-19 wave dynamics.

Sci Rep. 2021 May 26;11(1):10960. doi: 10.1038/s41598-021-90539-2.

Modeling vaccination rollouts, SARS-CoV-2 variants and the requirement for non-pharmaceutical interventions in Italy.

Nat Med. 2021 Jun;27(6):993-998. doi: 10.1038/s41591-021-01334-5. Epub 2021 Apr 16.

Multiwave pandemic dynamics explained: how to tame the next wave of infectious diseases.

Sci Rep. 2021 Mar 23;11(1):6638. doi: 10.1038/s41598-021-85875-2.

The temporal association of introducing and lifting non-pharmaceutical interventions with the time-varying reproduction number (R) of SARS-CoV-2: a modelling study across 131 countries.

Lancet Infect Dis. 2021 Feb;21(2):193-202. doi: 10.1016/S1473-3099(20)30785-4. Epub 2020 Oct 22.

Circulating SARS-CoV-2 spike N439K variants maintain fitness while evading antibody-mediated immunity.

Cell. 2021 Mar 4;184(5):1171-1187.e20. doi: 10.1016/j.cell.2021.01.037. Epub 2021 Jan 28.

Mining Google and Apple mobility data: temporal anatomy for COVID-19 social distancing.

Sci Rep. 2021 Feb 18;11(1):4150. doi: 10.1038/s41598-021-83441-4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于 Spike 蛋白突变的无监督机器学习分析的变异驱动早期预警用于 COVID-19。

Variant-driven early warning via unsupervised machine learning analysis of spike protein mutations for COVID-19.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献