通过对非模板化区域进行优先掩码处理，利用抗体语言模型进行聚焦学习。

Focused learning by antibody language models using preferential masking of non-templated regions.

作者信息

Ng Karenna, Briney Bryan

机构信息

Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA.

Center for Viral Systems Biology, The Scripps Research Institute, La Jolla, CA 92037, USA.

出版信息

Patterns (N Y). 2025 Apr 25;6(6):101239. doi: 10.1016/j.patter.2025.101239. eCollection 2025 Jun 13.

DOI:10.1016/j.patter.2025.101239

PMID:40575131

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12191730/

Abstract

Existing antibody language models (AbLMs) are pre-trained using a masked language modeling (MLM) objective with uniform masking probabilities. While these models excel at predicting germline residues, they often struggle with mutated and non-templated residues, which concentrate in the complementarity-determining regions (CDRs) and are crucial for antigen binding specificity. Here, we demonstrate that preferential masking of the primarily non-templated CDR3 is a compute-efficient strategy to enhance model performance. We pre-trained two AbLMs using either uniform or preferential masking and observed that the latter improves residue prediction accuracy in the highly variable CDR3. Preferential masking also improves antibody classification by native chain pairing and binding specificity, suggesting improved CDR3 understanding and indicating that non-random, learnable patterns help govern antibody chain pairing. We further show that specificity classification is largely informed by residues in the CDRs, demonstrating that AbLMs learn meaningful patterns that align with immunological understanding.

摘要

现有的抗体语言模型（AbLMs）使用具有均匀掩码概率的掩码语言建模（MLM）目标进行预训练。虽然这些模型在预测种系残基方面表现出色，但它们在处理突变和非模板化残基时往往存在困难，这些残基集中在互补决定区（CDR），对抗原结合特异性至关重要。在这里，我们证明优先掩码主要非模板化的CDR3是一种提高模型性能的计算高效策略。我们使用均匀掩码或优先掩码预训练了两个AbLMs，观察到后者提高了高度可变的CDR3中残基预测的准确性。优先掩码还通过天然链配对和结合特异性改善了抗体分类，表明对CDR3的理解有所改善，并表明非随机的、可学习的模式有助于控制抗体链配对。我们进一步表明，特异性分类在很大程度上由CDR中的残基决定，这表明AbLMs学习到了与免疫学理解一致的有意义模式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53c1/12191730/1067c96081a9/gr1.jpg

相似文献

Focused learning by antibody language models using preferential masking of non-templated regions.通过对非模板化区域进行优先掩码处理，利用抗体语言模型进行聚焦学习。

Patterns (N Y). 2025 Apr 25;6(6):101239. doi: 10.1016/j.patter.2025.101239. eCollection 2025 Jun 13.

Focused learning by antibody language models using preferential masking of non-templated regions.通过对非模板化区域进行优先掩蔽，利用抗体语言模型进行聚焦学习。

bioRxiv. 2024 Oct 28:2024.10.23.619908. doi: 10.1101/2024.10.23.619908.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Trajectory-Ordered Objectives for Self-Supervised Representation Learning of Temporal Healthcare Data Using Transformers: Model Development and Evaluation Study.使用Transformer进行时间序列医疗数据自监督表示学习的轨迹有序目标：模型开发与评估研究

JMIR Med Inform. 2025 Jun 4;13:e68138. doi: 10.2196/68138.

Rapid, point-of-care antigen tests for diagnosis of SARS-CoV-2 infection.用于 SARS-CoV-2 感染诊断的快速、即时抗原检测。

Cochrane Database Syst Rev. 2022 Jul 22;7(7):CD013705. doi: 10.1002/14651858.CD013705.pub3.

Stigma Management Strategies of Autistic Social Media Users.自闭症社交媒体用户的污名管理策略

Autism Adulthood. 2025 May 28;7(3):273-282. doi: 10.1089/aut.2023.0095. eCollection 2025 Jun.

Magnetic resonance perfusion for differentiating low-grade from high-grade gliomas at first presentation.首次就诊时磁共振灌注成像用于鉴别低级别与高级别胶质瘤

Cochrane Database Syst Rev. 2018 Jan 22;1(1):CD011551. doi: 10.1002/14651858.CD011551.pub2.

Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.降低男男性行为者中艾滋病毒性传播风险的行为干预措施。

Cochrane Database Syst Rev. 2008 Jul 16(3):CD001230. doi: 10.1002/14651858.CD001230.pub2.

"Just Ask What Support We Need": Autistic Adults' Feedback on Social Skills Training.“只需询问我们需要什么支持”：成年自闭症患者对社交技能培训的反馈

Autism Adulthood. 2025 May 28;7(3):283-292. doi: 10.1089/aut.2023.0136. eCollection 2025 Jun.

Antibody tests for identification of current and past infection with SARS-CoV-2.抗体检测用于鉴定 SARS-CoV-2 的现症感染和既往感染。

Cochrane Database Syst Rev. 2022 Nov 17;11(11):CD013652. doi: 10.1002/14651858.CD013652.pub2.

引用本文的文献

A Sitewise Model of Natural Selection on Individual Antibodies via a Transformer-Encoder.一种通过Transformer编码器对个体抗体进行自然选择的位点特异性模型。

Mol Biol Evol. 2025 Jul 30;42(8). doi: 10.1093/molbev/msaf186.

Nucleotide context models outperform protein language models for predicting antibody affinity maturation.在预测抗体亲和力成熟方面，核苷酸上下文模型优于蛋白质语言模型。

bioRxiv. 2025 Jun 18:2025.06.16.659977. doi: 10.1101/2025.06.16.659977.

A curriculum learning approach to training antibody language models.一种用于训练抗体语言模型的课程学习方法。

bioRxiv. 2025 Mar 2:2025.02.27.640641. doi: 10.1101/2025.02.27.640641.

本文引用的文献

Simulating 500 million years of evolution with a language model.用语言模型模拟5亿年的进化历程。

Science. 2025 Feb 21;387(6736):850-858. doi: 10.1126/science.ads0018. Epub 2025 Jan 16.

p-IgGen: a paired antibody generative language model.p-IgGen：一种配对抗体生成语言模型。

Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae659.

Addressing the antibody germline bias and its effect on language models for improved antibody design.解决抗体种系偏倚及其对改善抗体设计的语言模型的影响。

Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae618.

An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies.一种使用经过整理的流感血凝素抗体进行抗体特异性预测的可解释语言模型。

Immunity. 2024 Oct 8;57(10):2453-2465.e7. doi: 10.1016/j.immuni.2024.07.022. Epub 2024 Aug 19.

Deep repertoire mining uncovers ultra-broad coronavirus neutralizing antibodies targeting multiple spike epitopes.深度库挖掘揭示了针对多个刺突表位的超广谱冠状病毒中和抗体。

Cell Rep. 2024 Jun 25;43(6):114307. doi: 10.1016/j.celrep.2024.114307. Epub 2024 Jun 5.

Improving antibody language models with native pairing.通过天然配对改进抗体语言模型。

Patterns (N Y). 2024 Apr 4;5(5):100967. doi: 10.1016/j.patter.2024.100967. eCollection 2024 May 10.

Insights into the inner workings of transformer models for protein function prediction.揭示用于蛋白质功能预测的变压器模型的内部工作原理。

Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae031.

The promises of large language models for protein design and modeling.大型语言模型在蛋白质设计和建模方面的前景。

Front Bioinform. 2023 Nov 23;3:1304099. doi: 10.3389/fbinf.2023.1304099. eCollection 2023.

IgLM: Infilling language modeling for antibody sequence design.IgLM：抗体序列设计的填充语言模型。

Cell Syst. 2023 Nov 15;14(11):979-989.e4. doi: 10.1016/j.cels.2023.10.001. Epub 2023 Oct 30.

Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies.基于大规模天然抗体数据集的深度学习实现快速、准确的抗体结构预测。

Nat Commun. 2023 Apr 25;14(1):2389. doi: 10.1038/s41467-023-38063-x.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过对非模板化区域进行优先掩码处理，利用抗体语言模型进行聚焦学习。

Focused learning by antibody language models using preferential masking of non-templated regions.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献