一种使用经过整理的流感血凝素抗体进行抗体特异性预测的可解释语言模型。

An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies.

作者信息

Wang Yiquan, Lv Huibin, Lei Ruipeng, Yeung Yuen-Hei, Shen Ivana R, Choi Danbi, Teo Qi Wen, Tan Timothy J C, Gopal Akshita B, Chen Xin, Graham Claire S, Wu Nicholas C

机构信息

Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA.

Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA.

出版信息

bioRxiv. 2023 Sep 14:2023.09.11.557288. doi: 10.1101/2023.09.11.557288.

DOI:10.1101/2023.09.11.557288

PMID:37745338

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10515799/

Abstract

Despite decades of antibody research, it remains challenging to predict the specificity of an antibody solely based on its sequence. Two major obstacles are the lack of appropriate models and inaccessibility of datasets for model training. In this study, we curated a dataset of >5,000 influenza hemagglutinin (HA) antibodies by mining research publications and patents, which revealed many distinct sequence features between antibodies to HA head and stem domains. We then leveraged this dataset to develop a lightweight memory B cell language model (mBLM) for sequence-based antibody specificity prediction. Model explainability analysis showed that mBLM captured key sequence motifs of HA stem antibodies. Additionally, by applying mBLM to HA antibodies with unknown epitopes, we discovered and experimentally validated many HA stem antibodies. Overall, this study not only advances our molecular understanding of antibody response to influenza virus, but also provides an invaluable resource for applying deep learning to antibody research.

摘要

尽管抗体研究已有数十年，但仅根据抗体序列预测其特异性仍然具有挑战性。两个主要障碍是缺乏合适的模型以及用于模型训练的数据集难以获取。在本研究中，我们通过挖掘研究出版物和专利，精心策划了一个包含5000多种流感血凝素（HA）抗体的数据集，该数据集揭示了针对HA头部和茎部结构域的抗体之间许多不同的序列特征。然后，我们利用这个数据集开发了一种轻量级记忆B细胞语言模型（mBLM），用于基于序列的抗体特异性预测。模型可解释性分析表明，mBLM捕捉到了HA茎部抗体的关键序列基序。此外，通过将mBLM应用于具有未知表位的HA抗体，我们发现并通过实验验证了许多HA茎部抗体。总体而言，本研究不仅推进了我们对抗体对流感病毒反应的分子理解，还为将深度学习应用于抗体研究提供了宝贵资源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/125a/10515799/a53ff58c303b/nihpp-2023.09.11.557288v1-f0001.jpg

相似文献

An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies.一种使用经过整理的流感血凝素抗体进行抗体特异性预测的可解释语言模型。

bioRxiv. 2023 Sep 14:2023.09.11.557288. doi: 10.1101/2023.09.11.557288.

Immunity. 2024 Oct 8;57(10):2453-2465.e7. doi: 10.1016/j.immuni.2024.07.022. Epub 2024 Aug 19.

Mutations in Influenza A Virus Neuraminidase and Hemagglutinin Confer Resistance against a Broadly Neutralizing Hemagglutinin Stem Antibody.甲型流感病毒神经氨酸酶和血凝素的突变赋予了对广泛中和血凝素茎抗体的抗性。

J Virol. 2019 Jan 4;93(2). doi: 10.1128/JVI.01639-18. Print 2019 Jan 15.

Stringent and complex sequence constraints of an IGHV1-69 broadly neutralizing antibody to influenza HA stem.IGHV1-69 对流感血凝素茎部的广泛中和抗体的严格且复杂的序列限制。

Cell Rep. 2023 Nov 28;42(11):113410. doi: 10.1016/j.celrep.2023.113410. Epub 2023 Nov 16.

Conformational Stability of the Hemagglutinin of H5N1 Influenza A Viruses Influences Susceptibility to Broadly Neutralizing Stem Antibodies.H5N1 流感病毒血凝素的构象稳定性影响对广谱中和茎部抗体的敏感性。

J Virol. 2018 May 29;92(12). doi: 10.1128/JVI.00247-18. Print 2018 Jun 15.

Mapping of a Novel H3-Specific Broadly Neutralizing Monoclonal Antibody Targeting the Hemagglutinin Globular Head Isolated from an Elite Influenza Virus-Immunized Donor Exhibiting Serological Breadth.新型 H3 特异性广谱中和单克隆抗体的鉴定该抗体靶向血凝素球状头部，来源于一位具有广泛血清学反应性的流感病毒免疫供者

J Virol. 2020 Feb 28;94(6). doi: 10.1128/JVI.01035-19.

Unmasking Stem-Specific Neutralizing Epitopes by Abolishing N-Linked Glycosylation Sites of Influenza Virus Hemagglutinin Proteins for Vaccine Design.通过去除流感病毒血凝素蛋白的N-连接糖基化位点来揭示茎特异性中和表位用于疫苗设计

J Virol. 2016 Sep 12;90(19):8496-508. doi: 10.1128/JVI.00880-16. Print 2016 Oct 1.

Primary antibody response after influenza virus infection is first dominated by low-mutated HA-stem antibodies followed by higher-mutated HA-head antibodies.流感病毒感染后的初级抗体反应首先由低突变的 HA 茎抗体主导，然后是高突变的 HA 头抗体。

Front Immunol. 2022 Nov 3;13:1026951. doi: 10.3389/fimmu.2022.1026951. eCollection 2022.

Influenza A Virus Hemagglutinin Trimer, Head and Stem Proteins Identify and Quantify Different Hemagglutinin-Specific B Cell Subsets in Humans.甲型流感病毒血凝素三聚体、头部和茎部蛋白可识别并定量人类中不同的血凝素特异性B细胞亚群。

Vaccines (Basel). 2021 Jul 2;9(7):717. doi: 10.3390/vaccines9070717.

Influenza virus antibodies inhibit antigen-specific B cell responses in mice.流感病毒抗体可抑制小鼠的抗原特异性 B 细胞反应。

J Virol. 2024 Sep 17;98(9):e0076624. doi: 10.1128/jvi.00766-24. Epub 2024 Aug 28.

引用本文的文献

Supervised fine-tuning of pre-trained antibody language models improves antigen specificity prediction.预训练抗体语言模型的监督微调可提高抗原特异性预测能力。

PLoS Comput Biol. 2025 Mar 31;21(3):e1012153. doi: 10.1371/journal.pcbi.1012153. eCollection 2025 Mar.

本文引用的文献

IgLM: Infilling language modeling for antibody sequence design.IgLM：抗体序列设计的填充语言模型。

Cell Syst. 2023 Nov 15;14(11):979-989.e4. doi: 10.1016/j.cels.2023.10.001. Epub 2023 Oct 30.

Efficient evolution of human antibodies from general protein language models.从通用蛋白质语言模型中高效进化出人类抗体。

Nat Biotechnol. 2024 Feb;42(2):275-283. doi: 10.1038/s41587-023-01763-2. Epub 2023 Apr 24.

An influenza H1 hemagglutinin stem-only immunogen elicits a broadly cross-reactive B cell response in humans.一种仅含流感H1血凝素茎部的免疫原在人体内引发广泛交叉反应的B细胞应答。

Sci Transl Med. 2023 Apr 19;15(692):eade4976. doi: 10.1126/scitranslmed.ade4976.

Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。

Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.

Large language models generate functional protein sequences across diverse families.大型语言模型可生成不同家族的功能性蛋白质序列。

Nat Biotechnol. 2023 Aug;41(8):1099-1106. doi: 10.1038/s41587-022-01618-2. Epub 2023 Jan 26.

AbLang: an antibody language model for completing antibody sequences.AbLang：一种用于完成抗体序列的抗体语言模型。

Bioinform Adv. 2022 Jun 17;2(1):vbac046. doi: 10.1093/bioadv/vbac046. eCollection 2022.

Critical review of conformational B-cell epitope prediction methods.构象性B细胞表位预测方法的批判性综述

Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac567.

Novel machine learning approaches revolutionize protein knowledge.新型机器学习方法彻底改变了蛋白质知识。

Trends Biochem Sci. 2023 Apr;48(4):345-359. doi: 10.1016/j.tibs.2022.11.001. Epub 2022 Dec 9.

Structural basis for a human broadly neutralizing influenza A hemagglutinin stem-specific antibody including H17/18 subtypes.人类广谱中和流感 A 血凝素茎特异性抗体的结构基础，包括 H17/18 亚型。

Nat Commun. 2022 Dec 9;13(1):7603. doi: 10.1038/s41467-022-35236-y.

Co-immunization with hemagglutinin stem immunogens elicits cross-group neutralizing antibodies and broad protection against influenza A viruses.与血凝素茎免疫原共同免疫可引发跨组别的中和抗体，并对甲型流感病毒提供广泛保护。

Immunity. 2022 Dec 13;55(12):2405-2418.e7. doi: 10.1016/j.immuni.2022.10.015. Epub 2022 Nov 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种使用经过整理的流感血凝素抗体进行抗体特异性预测的可解释语言模型。

An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献