Ito Jumpei, Strange Adam, Liu Wei, Joas Gustav, Lytras Spyros, Sato Kei
Division of Systems Virology, Department of Microbiology and Immunology, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan.
International Research Center for Infectious Diseases, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan.
Nat Commun. 2025 May 13;16(1):4236. doi: 10.1038/s41467-025-59422-w.
Successively emerging SARS-CoV-2 variants lead to repeated epidemic surges through escalated fitness (i.e., relative effective reproduction number between variants). Modeling the genotype-fitness relationship enables us to pinpoint the mutations boosting viral fitness and flag high-risk variants immediately after their detection. Here, we present CoVFit, a protein language model adapted from ESM-2, designed to predict variant fitness based solely on spike protein sequences. CoVFit was trained on genotype-fitness data derived from viral genome surveillance and functional mutation assays related to immune evasion. CoVFit successively ranked the fitness of unknown future variants harboring nearly 15 mutations with informative accuracy. CoVFit identified 959 fitness elevation events throughout SARS-CoV-2 evolution until late 2023. Furthermore, we show that CoVFit is applicable for predicting viral evolution through single amino acid mutations. Our study gives insight into the SARS-CoV-2 fitness landscape and provides a tool for efficiently identifying SARS-CoV-2 variants with higher epidemic risk.
不断出现的新冠病毒(SARS-CoV-2)变种通过提升适应性(即变种之间的相对有效繁殖数)导致疫情反复激增。对基因型-适应性关系进行建模使我们能够确定增强病毒适应性的突变,并在检测到高风险变种后立即对其发出警示。在此,我们展示了CoVFit,这是一种从ESM-2改编而来的蛋白质语言模型,旨在仅根据刺突蛋白序列预测变种的适应性。CoVFit基于源自病毒基因组监测和与免疫逃逸相关的功能突变试验的基因型-适应性数据进行训练。CoVFit连续对携带近15个突变的未知未来变种的适应性进行了具有信息性准确性的排名。CoVFit在整个新冠病毒进化过程中直至2023年末识别出959个适应性提升事件。此外,我们表明CoVFit适用于通过单氨基酸突变预测病毒进化。我们的研究深入了解了新冠病毒的适应性格局,并提供了一种有效识别具有更高疫情风险的新冠病毒变种的工具。