CovTransformer：一种用于预测SARS-CoV-2谱系频率的变压器模型。

CovTransformer: A transformer model for SARS-CoV-2 lineage frequency forecasting.

作者信息

Feng Yinan, Goldberg Emma E, Kupperman Michael, Zhang Xitong, Lin Youzuo, Ke Ruian

机构信息

Earth and Environmental Sciences Division, Los Alamos National Laboratory, Los Alamos, NM, United States.

Theoretical Biology and Biophysics, Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, United States.

出版信息

Virus Evol. 2024 Nov 14;10(1):veae086. doi: 10.1093/ve/veae086. eCollection 2024.

DOI:10.1093/ve/veae086

PMID:39659498

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11631054/

Abstract

With hundreds of SARS-CoV-2 lineages circulating in the global population, there is an ongoing need for predicting and forecasting lineage frequencies and thus identifying rapidly expanding lineages. Accurate prediction would allow for more focused experimental efforts to understand pathogenicity of future dominating lineages and characterize the extent of their immune escape. Here, we first show that the inherent noise and biases in lineage frequency data make a commonly-used regression-based approach unreliable. To address this weakness, we constructed a machine learning model for SARS-CoV-2 lineage frequency forecasting, called CovTransformer, based on the transformer architecture. We designed our model to navigate challenges such as a limited amount of data with high levels of noise and bias. We first trained and tested the model using data from the UK and the USA, and then tested the generalization ability of the model to many other countries and US states. Remarkably, the trained model makes accurate predictions two months into the future with high levels of accuracy both globally (in 31 countries with high levels of sequencing effort) and at the US-state level. Our model performed substantially better than a widely used forecasting tool, the multinomial regression model implemented in Nextstrain, demonstrating its utility in SARS-CoV-2 monitoring. Assuming a newly emerged lineage is identified and assigned, our test using retrospective data shows that our model is able to identify the dominating lineages 7 weeks in advance on average before they became dominant. Overall, our work demonstrates that transformer models represent a promising approach for SARS-CoV-2 forecasting and pandemic monitoring.

摘要

随着数百种新冠病毒谱系在全球人群中传播，持续需要预测和预估谱系频率，从而识别快速扩张的谱系。准确的预测将使实验工作更具针对性，以了解未来主导谱系的致病性并描述其免疫逃逸程度。在此，我们首先表明，谱系频率数据中的固有噪声和偏差使得常用的基于回归的方法不可靠。为解决这一弱点，我们基于Transformer架构构建了一个用于新冠病毒谱系频率预测的机器学习模型，称为CovTransformer。我们设计该模型以应对诸如数据量有限且噪声和偏差水平高之类的挑战。我们首先使用来自英国和美国的数据对模型进行训练和测试，然后测试模型对许多其他国家和美国各州的泛化能力。值得注意的是，训练后的模型能够在未来两个月做出准确预测，在全球范围内（在31个测序工作水平较高的国家）以及在美国州一级都具有很高的准确性。我们的模型表现明显优于一种广泛使用的预测工具，即Nextstrain中实施的多项式回归模型，证明了其在新冠病毒监测中的效用。假设识别并指定了一个新出现的谱系，我们使用回顾性数据进行的测试表明，我们的模型能够在主导谱系成为主导之前平均提前7周识别出它们。总体而言，我们的工作表明Transformer模型是新冠病毒预测和疫情监测的一种有前景的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c2c2/11631054/1886f27094be/veae086f1.jpg

相似文献

CovTransformer: A transformer model for SARS-CoV-2 lineage frequency forecasting.CovTransformer：一种用于预测SARS-CoV-2谱系频率的变压器模型。

Virus Evol. 2024 Nov 14;10(1):veae086. doi: 10.1093/ve/veae086. eCollection 2024.

Variation in the ACE2 receptor has limited utility for SARS-CoV-2 host prediction.ACE2 受体的变异性对预测 SARS-CoV-2 的宿主有限用性。

Elife. 2022 Nov 23;11:e80329. doi: 10.7554/eLife.80329.

Harnessing the power of AI: Advanced deep learning models optimization for accurate SARS-CoV-2 forecasting.利用人工智能的力量：优化高级深度学习模型以实现准确的 SARS-CoV-2 预测。

PLoS One. 2023 Jul 20;18(7):e0287755. doi: 10.1371/journal.pone.0287755. eCollection 2023.

Genetic diversity and genomic epidemiology of SARS-CoV-2 during the first 3 years of the pandemic in Morocco: comprehensive sequence analysis, including the unique lineage B.1.528 in Morocco.摩洛哥疫情头三年期间严重急性呼吸综合征冠状病毒2（SARS-CoV-2）的遗传多样性和基因组流行病学：全面序列分析，包括摩洛哥独特的B.1.528谱系

Access Microbiol. 2024 Oct 7;6(10). doi: 10.1099/acmi.0.000853.v4. eCollection 2024.

Enhancing SARS-CoV-2 Lineage Surveillance through the Integration of a Simple and Direct qPCR-Based Protocol Adaptation with Established Machine Learning Algorithms.通过将简单直接的基于 qPCR 的方案改编与成熟的机器学习算法相结合，加强 SARS-CoV-2 谱系监测。

Anal Chem. 2024 Nov 19;96(46):18537-18544. doi: 10.1021/acs.analchem.4c04492. Epub 2024 Nov 4.

Development and evaluation of a machine learning-based in-hospital COVID-19 disease outcome predictor (CODOP): A multicontinental retrospective study.基于机器学习的院内 COVID-19 疾病转归预测器（CODOP）的开发和评估：一项多大陆回顾性研究。

Elife. 2022 May 17;11:e75985. doi: 10.7554/eLife.75985.

The future of Cochrane Neonatal.考克兰新生儿协作网的未来。

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

An exploration of challenges associated with machine learning for time series forecasting of COVID-19 community spread using wastewater-based epidemiological data.利用基于废水的流行病学数据对 COVID-19 社区传播进行时间序列预测的机器学习相关挑战的探索。

Sci Total Environ. 2023 Feb 1;858(Pt 1):159748. doi: 10.1016/j.scitotenv.2022.159748. Epub 2022 Oct 25.

Tracking SARS-CoV-2 variants of concern in wastewater: an assessment of nine computational tools using simulated genomic data.追踪废水中的 SARS-CoV-2 关切变异株：使用模拟基因组数据评估九种计算工具。

Microb Genom. 2024 May;10(5). doi: 10.1099/mgen.0.001249.

Emotion Forecasting: A Transformer-Based Approach.情感预测：一种基于Transformer的方法。

J Med Internet Res. 2025 Mar 18;27:e63962. doi: 10.2196/63962.

本文引用的文献

Fitness models provide accurate short-term forecasts of SARS-CoV-2 variant frequency.健身模特能准确预测 SARS-CoV-2 变异株的流行频率。

PLoS Comput Biol. 2024 Sep 6;20(9):e1012443. doi: 10.1371/journal.pcbi.1012443. eCollection 2024 Sep.

Population immunity predicts evolutionary trajectories of SARS-CoV-2.人群免疫力预测了 SARS-CoV-2 的进化轨迹。

Cell. 2023 Nov 9;186(23):5151-5164.e13. doi: 10.1016/j.cell.2023.09.022. Epub 2023 Oct 23.

Mapping SARS-CoV-2 antigenic relationships and serological responses.绘制 SARS-CoV-2 抗原关系和血清学反应图。

Science. 2023 Oct 6;382(6666):eadj0070. doi: 10.1126/science.adj0070.

Characterizing SARS-CoV-2 neutralization profiles after bivalent boosting using antigenic cartography.用抗原作图技术描绘二价加强后对 SARS-CoV-2 的中和特征。

Nat Commun. 2023 Aug 26;14(1):5224. doi: 10.1038/s41467-023-41049-4.

Fitness, growth and transmissibility of SARS-CoV-2 genetic variants.新冠病毒变异株的适应性、生长能力和传染性。

Nat Rev Genet. 2023 Oct;24(10):724-734. doi: 10.1038/s41576-023-00610-z. Epub 2023 Jun 16.

Multimodal Learning With Transformers: A Survey.基于Transformer的多模态学习：一项综述。

IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12113-12132. doi: 10.1109/TPAMI.2023.3275156. Epub 2023 Sep 5.

The evolution of SARS-CoV-2.严重急性呼吸综合征冠状病毒2的进化

Nat Rev Microbiol. 2023 Jun;21(6):361-379. doi: 10.1038/s41579-023-00878-2. Epub 2023 Apr 5.

SARS-CoV-2 variant transition dynamics are associated with vaccination rates, number of co-circulating variants, and convalescent immunity.SARS-CoV-2 变异株的传播动态与疫苗接种率、共同传播变异株的数量以及康复后的免疫能力有关。

EBioMedicine. 2023 May;91:104534. doi: 10.1016/j.ebiom.2023.104534. Epub 2023 Mar 31.

Incorporating variant frequencies data into short-term forecasting for COVID-19 cases and deaths in the USA: a deep learning approach.将变异频率数据纳入美国 COVID-19 病例和死亡的短期预测：深度学习方法。

EBioMedicine. 2023 Mar;89:104482. doi: 10.1016/j.ebiom.2023.104482. Epub 2023 Feb 21.

SARS-CoV-2 variant biology: immune escape, transmission and fitness.SARS-CoV-2 变体生物学：免疫逃逸、传播和适应性。

Nat Rev Microbiol. 2023 Mar;21(3):162-177. doi: 10.1038/s41579-022-00841-7. Epub 2023 Jan 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

CovTransformer：一种用于预测SARS-CoV-2谱系频率的变压器模型。

CovTransformer: A transformer model for SARS-CoV-2 lineage frequency forecasting.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献