一种用于严重急性呼吸综合征冠状病毒2（SARS-CoV-2）进化的预测性语言模型。

A predictive language model for SARS-CoV-2 evolution.

作者信息

Ma Enhao, Guo Xuan, Hu Mingda, Wang Penghua, Wang Xin, Wei Congwen, Cheng Gong

机构信息

School of Basic Medical Science, Tsinghua University, 30 Shuangqing Rd., Haidian District, Beijing, 100084, China.

Institute of Infectious Diseases, Shenzhen Bay Laboratory, Guangqiao Rd., Guangming District, Shenzhen, Guangdong, 518000, China.

出版信息

Signal Transduct Target Ther. 2024 Dec 23;9(1):353. doi: 10.1038/s41392-024-02066-x.

DOI:10.1038/s41392-024-02066-x

PMID:39710752

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11663983/

Abstract

Modeling and predicting mutations are critical for COVID-19 and similar pandemic preparedness. However, existing predictive models have yet to integrate the regularity and randomness of viral mutations with minimal data requirements. Here, we develop a non-demanding language model utilizing both regularity and randomness to predict candidate SARS-CoV-2 variants and mutations that might prevail. We constructed the "grammatical frameworks" of the available S1 sequences for dimension reduction and semantic representation to grasp the model's latent regularity. The mutational profile, defined as the frequency of mutations, was introduced into the model to incorporate randomness. With this model, we successfully identified and validated several variants with significantly enhanced viral infectivity and immune evasion by wet-lab experiments. By inputting the sequence data from three different time points, we detected circulating strains or vital mutations for XBB.1.16, EG.5, JN.1, and BA.2.86 strains before their emergence. In addition, our results also predicted the previously unknown variants that may cause future epidemics. With both the data validation and experiment evidence, our study represents a fast-responding, concise, and promising language model, potentially generalizable to other viral pathogens, to forecast viral evolution and detect crucial hot mutation spots, thus warning the emerging variants that might raise public health concern.

摘要

对新冠病毒及类似大流行疾病的防范而言，对突变进行建模和预测至关重要。然而，现有的预测模型尚未将病毒突变的规律性和随机性与最少的数据需求相结合。在此，我们开发了一种要求不高的语言模型，它利用规律性和随机性来预测可能流行的新冠病毒变异株和突变。我们构建了可用S1序列的“语法框架”以进行降维和语义表示，从而掌握模型的潜在规律性。将定义为突变频率的突变图谱引入模型以纳入随机性。利用该模型，我们通过湿实验室实验成功识别并验证了几种具有显著增强的病毒感染力和免疫逃逸能力的变异株。通过输入来自三个不同时间点的序列数据，我们在XBB.1.16、EG.5、JN.1和BA.2.86毒株出现之前就检测到了它们的流行毒株或关键突变。此外，我们的结果还预测了可能导致未来疫情的此前未知的变异株。通过数据验证和实验证据，我们的研究展示了一个快速响应、简洁且有前景的语言模型，它可能适用于其他病毒病原体，以预测病毒进化并检测关键的热点突变位点，从而警示可能引发公众健康担忧的新出现变异株。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ce6/11663983/590d2a2a3afe/41392_2024_2066_Fig1_HTML.jpg

相似文献

A predictive language model for SARS-CoV-2 evolution.一种用于严重急性呼吸综合征冠状病毒2（SARS-CoV-2）进化的预测性语言模型。

Signal Transduct Target Ther. 2024 Dec 23;9(1):353. doi: 10.1038/s41392-024-02066-x.

Long-term serial passaging of SARS-CoV-2 reveals signatures of convergent evolution.严重急性呼吸综合征冠状病毒2（SARS-CoV-2）的长期连续传代揭示了趋同进化的特征。

J Virol. 2025 Jul 22;99(7):e0036325. doi: 10.1128/jvi.00363-25. Epub 2025 Jun 9.

Determinants of susceptibility to SARS-CoV-2 infection in murine ACE2.小鼠血管紧张素转换酶2（ACE2）对严重急性呼吸综合征冠状病毒2（SARS-CoV-2）感染易感性的决定因素。

J Virol. 2025 Jun 17;99(6):e0054325. doi: 10.1128/jvi.00543-25. Epub 2025 May 12.

Quantitative characterisation of extracellular vesicles designed to decoy or compete with SARS-CoV-2 reveals differential mode of action across variants of concern and highlights the diversity of Omicron.旨在与严重急性呼吸综合征冠状病毒2（SARS-CoV-2）诱饵或竞争的细胞外囊泡的定量表征揭示了针对不同关注变体的不同作用模式，并突出了奥密克戎的多样性。

Cell Commun Signal. 2025 Jul 2;23(1):323. doi: 10.1186/s12964-025-02223-x.

Genetic and Immunological Profiling of Recent SARS-CoV-2 Omicron Subvariants: Insights into Immune Evasion and Infectivity in Monoinfections and Coinfections.新冠病毒奥密克戎变种新亚型的基因和免疫特征分析：单重感染和合并感染中免疫逃逸及传染性的见解

Viruses. 2025 Jun 27;17(7):918. doi: 10.3390/v17070918.

An mRNA vaccine encoding the SARS-CoV-2 Omicron XBB.1.5 receptor-binding domain protects mice from the JN.1 variant.一种编码严重急性呼吸综合征冠状病毒2（SARS-CoV-2）奥密克戎XBB.1.5受体结合结构域的信使核糖核酸（mRNA）疫苗可保护小鼠免受JN.1变体的感染。

EBioMedicine. 2025 Jun 6;117:105794. doi: 10.1016/j.ebiom.2025.105794.

Convergent evolution in nucleocapsid facilitated SARS-CoV-2 adaptation for human infection.核衣壳的趋同进化促进了新冠病毒对人类感染的适应性。

J Virol. 2025 Jul 22;99(7):e0209124. doi: 10.1128/jvi.02091-24. Epub 2025 Jun 12.

Measures implemented in the school setting to contain the COVID-19 pandemic.学校为控制 COVID-19 疫情而采取的措施。

Cochrane Database Syst Rev. 2022 Jan 17;1(1):CD015029. doi: 10.1002/14651858.CD015029.

Application of a high-resolution melt assay for monitoring SARS-CoV-2 variants in Burkina Faso and Kenya.高分辨率熔解分析在布基纳法索和肯尼亚监测严重急性呼吸综合征冠状病毒2（SARS-CoV-2）变体中的应用。

mSphere. 2025 Jun 25;10(6):e0002725. doi: 10.1128/msphere.00027-25. Epub 2025 May 29.

Antibody tests for identification of current and past infection with SARS-CoV-2.抗体检测用于鉴定 SARS-CoV-2 的现症感染和既往感染。

Cochrane Database Syst Rev. 2022 Nov 17;11(11):CD013652. doi: 10.1002/14651858.CD013652.pub2.

引用本文的文献

Evolving fitness and immune escape: a retrospective analysis of SARS-CoV-2 spike protein (2020-2024) using protein language model.不断演变的适应性与免疫逃逸：使用蛋白质语言模型对严重急性呼吸综合征冠状病毒2刺突蛋白（2020 - 2024年）的回顾性分析

Front Immunol. 2025 Jun 18;16:1576414. doi: 10.3389/fimmu.2025.1576414. eCollection 2025.

Bracing the artificial intelligence technology in viral infectious disease control.在病毒性传染病防控中支持人工智能技术。

Infect Med (Beijing). 2025 May 27;4(2):100186. doi: 10.1016/j.imj.2025.100186. eCollection 2025 Jun.

本文引用的文献

Learning from prepandemic data to forecast viral escape.从大流行前的数据中学习以预测病毒逃逸。

Nature. 2023 Oct;622(7984):818-825. doi: 10.1038/s41586-023-06617-0. Epub 2023 Oct 11.

Predicting the antigenic evolution of SARS-COV-2 with deep learning.利用深度学习预测 SARS-COV-2 的抗原进化。

Nat Commun. 2023 Jun 13;14(1):3478. doi: 10.1038/s41467-023-39199-6.

Convergent evolution of SARS-CoV-2 Omicron subvariants leading to the emergence of BQ.1.1 variant.奥密克戎亚变体导致 BQ.1.1 变体出现的趋同进化。

Nat Commun. 2023 May 11;14(1):2671. doi: 10.1038/s41467-023-38188-z.

Virological characteristics of the SARS-CoV-2 omicron XBB.1.16 variant.严重急性呼吸综合征冠状病毒2型奥密克戎XBB.1.16变体的病毒学特征

Lancet Infect Dis. 2023 Jun;23(6):655-656. doi: 10.1016/S1473-3099(23)00278-5. Epub 2023 May 3.

Antigenic characterization of SARS-CoV-2 Omicron subvariants XBB.1.5, BQ.1, BQ.1.1, BF.7 and BA.2.75.2.新型冠状病毒奥密克戎亚变体XBB.1.5、BQ.1、BQ.1.1、BF.7和BA.2.75.2的抗原特性

Signal Transduct Target Ther. 2023 Mar 15;8(1):125. doi: 10.1038/s41392-023-01391-x.

Durability of neutralization against Omicron subvariants after vaccination and breakthrough infection.接种疫苗和突破性感染后对奥密克戎亚变体的中和作用持久性。

Cell Rep. 2023 Feb 28;42(2):112075. doi: 10.1016/j.celrep.2023.112075. Epub 2023 Jan 27.

SARS-CoV-2 variant biology: immune escape, transmission and fitness.SARS-CoV-2 变体生物学：免疫逃逸、传播和适应性。

Nat Rev Microbiol. 2023 Mar;21(3):162-177. doi: 10.1038/s41579-022-00841-7. Epub 2023 Jan 18.

Preclinical development of kinetin as a safe error-prone SARS-CoV-2 antiviral able to attenuate virus-induced inflammation.kinetin 作为一种安全易错的 SARS-CoV-2 抗病毒药物的临床前开发，能够减轻病毒引起的炎症。

Nat Commun. 2023 Jan 13;14(1):199. doi: 10.1038/s41467-023-35928-z.

Alarming antibody evasion properties of rising SARS-CoV-2 BQ and XBB subvariants.令人担忧的 SARS-CoV-2 BQ 和 XBB 亚型不断出现的抗体逃逸特性。

Cell. 2023 Jan 19;186(2):279-286.e8. doi: 10.1016/j.cell.2022.12.018. Epub 2022 Dec 14.

Imprinted SARS-CoV-2 humoral immunity induces convergent Omicron RBD evolution.印迹 SARS-CoV-2 体液免疫诱导奥密克戎 RBD 进化趋同。

Nature. 2023 Feb;614(7948):521-529. doi: 10.1038/s41586-022-05644-7. Epub 2022 Dec 19.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于严重急性呼吸综合征冠状病毒2（SARS-CoV-2）进化的预测性语言模型。

A predictive language model for SARS-CoV-2 evolution.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献