Suppr超能文献

IgLM:抗体序列设计的填充语言模型。

IgLM: Infilling language modeling for antibody sequence design.

机构信息

Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA.

Program in Molecular Biophysics, The Johns Hopkins University, Baltimore, MD, USA.

出版信息

Cell Syst. 2023 Nov 15;14(11):979-989.e4. doi: 10.1016/j.cels.2023.10.001. Epub 2023 Oct 30.

Abstract

Discovery and optimization of monoclonal antibodies for therapeutic applications relies on large sequence libraries but is hindered by developability issues such as low solubility, high aggregation, and high immunogenicity. Generative language models, trained on millions of protein sequences, are a powerful tool for the on-demand generation of realistic, diverse sequences. We present the Immunoglobulin Language Model (IgLM), a deep generative language model for creating synthetic antibody libraries. Compared with prior methods that leverage unidirectional context for sequence generation, IgLM formulates antibody design based on text-infilling in natural language, allowing it to re-design variable-length spans within antibody sequences using bidirectional context. We trained IgLM on 558 million (M) antibody heavy- and light-chain variable sequences, conditioning on each sequence's chain type and species of origin. We demonstrate that IgLM can generate full-length antibody sequences from a variety of species and its infilling formulation allows it to generate infilled complementarity-determining region (CDR) loop libraries with improved in silico developability profiles. A record of this paper's transparent peer review process is included in the supplemental information.

摘要

治疗性单克隆抗体的发现和优化依赖于大型序列文库,但由于可开发性问题(如低溶解度、高聚集性和高免疫原性)而受到阻碍。基于数百万个蛋白质序列进行训练的生成式语言模型是按需生成逼真、多样化序列的强大工具。我们提出了免疫球蛋白语言模型(IgLM),这是一种用于创建合成抗体文库的深度生成式语言模型。与以前利用单向上下文进行序列生成的方法相比,IgLM 根据自然语言中的文本填充来制定抗体设计,允许它使用双向上下文重新设计抗体序列中的可变长度跨度。我们在 5.58 亿(M)个抗体重链和轻链可变序列上对 IgLM 进行了训练,对每个序列的链类型和来源物种进行了条件处理。我们证明了 IgLM 可以从多种物种生成全长抗体序列,并且其填充公式允许它生成填充互补决定区(CDR)环文库,具有改进的计算可开发性特征。本文透明同行评审过程的记录包含在补充信息中。

相似文献

1
IgLM: Infilling language modeling for antibody sequence design.IgLM:抗体序列设计的填充语言模型。
Cell Syst. 2023 Nov 15;14(11):979-989.e4. doi: 10.1016/j.cels.2023.10.001. Epub 2023 Oct 30.

引用本文的文献

5
A Survey of Pretrained Protein Language Models.预训练蛋白质语言模型综述
Methods Mol Biol. 2025;2941:1-29. doi: 10.1007/978-1-0716-4623-6_1.
9
An expandable synthetic library of human paired antibody sequences.一个可扩展的人类配对抗体序列合成文库。
PLoS Comput Biol. 2025 Apr 21;21(4):e1012932. doi: 10.1371/journal.pcbi.1012932. eCollection 2025 Apr.

本文引用的文献

1
ProGen2: Exploring the boundaries of protein language models.ProGen2:探索蛋白质语言模型的边界。
Cell Syst. 2023 Nov 15;14(11):968-978.e3. doi: 10.1016/j.cels.2023.10.002. Epub 2023 Oct 30.
6
8
Deciphering the language of antibodies using self-supervised learning.利用自监督学习破解抗体语言。
Patterns (N Y). 2022 May 18;3(7):100513. doi: 10.1016/j.patter.2022.100513. eCollection 2022 Jul 8.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验