Suppr超能文献

语言模型学习表征人类甲型H3流感病毒的抗原特性。

Language models learn to represent antigenic properties of human influenza A(H3) virus.

作者信息

Durazzi Francesco, Koopmans Marion P G, Fouchier Ron A M, Remondini Daniel

机构信息

Department of Physics and Astronomy, University of Bologna, 40127, Bologna, Italy.

Department of Viroscience, Erasmus Medical Centre, 3015 CN, Rotterdam, The Netherlands.

出版信息

Sci Rep. 2025 Jul 1;15(1):21364. doi: 10.1038/s41598-025-03275-2.

Abstract

Given that influenza vaccine effectiveness depends on a good antigenic match between the vaccine and circulating viruses, it is important to assess the antigenic properties of newly emerging variants continuously. With the increasing application of real-time pathogen genomic surveillance, a key question is if antigenic properties can reliably be predicted from influenza virus genomic information. Based on validated linked datasets of influenza virus genomic and wet lab experimental results, in silico models may be of use to learn to predict immune escape of variants of interest starting from the protein sequence only. In this study, we compared several machine-learning methods to reconstruct antigenic map coordinates for HA1 protein sequences of influenza A(H3N2) virus, to rank substitutions responsible for major antigenic changes, and to recognize variants with novel antigenic properties that may warrant future vaccine updates. Methods based on deep learning language models (BiLSTM and ProtBERT) and more classical approaches based solely on genetic distances and physicochemical properties of amino acid sequences had comparable performances over the coarser features of the map, but the first two performed better over fine-grained features like single amino acid-driven antigenic change and in silico deep mutational scanning experiments to rank the substitutions with the largest impact on antigenic properties. Given that the best performing model that produces protein embeddings is agnostic to the specific pathogen, the presented approach may be applicable to other pathogens.

摘要

鉴于流感疫苗的有效性取决于疫苗与流行病毒之间良好的抗原匹配,持续评估新出现变异株的抗原特性非常重要。随着实时病原体基因组监测的应用日益增加,一个关键问题是能否从流感病毒基因组信息可靠地预测抗原特性。基于经过验证的流感病毒基因组和湿实验室实验结果的关联数据集,计算机模拟模型可能有助于从仅蛋白质序列开始学习预测感兴趣变异株的免疫逃逸情况。在本研究中,我们比较了几种机器学习方法,以重建甲型(H3N2)流感病毒HA1蛋白序列的抗原图谱坐标,对导致主要抗原变化的替换进行排序,并识别可能需要未来更新疫苗的具有新抗原特性的变异株。基于深度学习语言模型(双向长短期记忆网络和蛋白质预训练模型)的方法以及仅基于氨基酸序列的遗传距离和物理化学性质的更经典方法,在图谱的较粗略特征上具有可比的性能,但前两种方法在细粒度特征(如单个氨基酸驱动的抗原变化)以及计算机模拟深度突变扫描实验以对对抗原特性影响最大的替换进行排序方面表现更好。鉴于产生蛋白质嵌入的性能最佳模型与特定病原体无关,所提出的方法可能适用于其他病原体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d15e/12219074/df656105c2c7/41598_2025_3275_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验