Suppr超能文献

一种用于严重急性呼吸综合征冠状病毒2(SARS-CoV-2)进化的预测性语言模型。

A predictive language model for SARS-CoV-2 evolution.

作者信息

Ma Enhao, Guo Xuan, Hu Mingda, Wang Penghua, Wang Xin, Wei Congwen, Cheng Gong

机构信息

School of Basic Medical Science, Tsinghua University, 30 Shuangqing Rd., Haidian District, Beijing, 100084, China.

Institute of Infectious Diseases, Shenzhen Bay Laboratory, Guangqiao Rd., Guangming District, Shenzhen, Guangdong, 518000, China.

出版信息

Signal Transduct Target Ther. 2024 Dec 23;9(1):353. doi: 10.1038/s41392-024-02066-x.

Abstract

Modeling and predicting mutations are critical for COVID-19 and similar pandemic preparedness. However, existing predictive models have yet to integrate the regularity and randomness of viral mutations with minimal data requirements. Here, we develop a non-demanding language model utilizing both regularity and randomness to predict candidate SARS-CoV-2 variants and mutations that might prevail. We constructed the "grammatical frameworks" of the available S1 sequences for dimension reduction and semantic representation to grasp the model's latent regularity. The mutational profile, defined as the frequency of mutations, was introduced into the model to incorporate randomness. With this model, we successfully identified and validated several variants with significantly enhanced viral infectivity and immune evasion by wet-lab experiments. By inputting the sequence data from three different time points, we detected circulating strains or vital mutations for XBB.1.16, EG.5, JN.1, and BA.2.86 strains before their emergence. In addition, our results also predicted the previously unknown variants that may cause future epidemics. With both the data validation and experiment evidence, our study represents a fast-responding, concise, and promising language model, potentially generalizable to other viral pathogens, to forecast viral evolution and detect crucial hot mutation spots, thus warning the emerging variants that might raise public health concern.

摘要

对新冠病毒及类似大流行疾病的防范而言,对突变进行建模和预测至关重要。然而,现有的预测模型尚未将病毒突变的规律性和随机性与最少的数据需求相结合。在此,我们开发了一种要求不高的语言模型,它利用规律性和随机性来预测可能流行的新冠病毒变异株和突变。我们构建了可用S1序列的“语法框架”以进行降维和语义表示,从而掌握模型的潜在规律性。将定义为突变频率的突变图谱引入模型以纳入随机性。利用该模型,我们通过湿实验室实验成功识别并验证了几种具有显著增强的病毒感染力和免疫逃逸能力的变异株。通过输入来自三个不同时间点的序列数据,我们在XBB.1.16、EG.5、JN.1和BA.2.86毒株出现之前就检测到了它们的流行毒株或关键突变。此外,我们的结果还预测了可能导致未来疫情的此前未知的变异株。通过数据验证和实验证据,我们的研究展示了一个快速响应、简洁且有前景的语言模型,它可能适用于其他病毒病原体,以预测病毒进化并检测关键的热点突变位点,从而警示可能引发公众健康担忧的新出现变异株。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ce6/11663983/590d2a2a3afe/41392_2024_2066_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验