对于抗体序列生成建模，混合模型可能就是你所需要的。

For antibody sequence generative modeling, mixture models may be all you need.

机构信息

Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA 92093-0359, United States.

MAP Bioscience, La Jolla, CA 92093, United States.

出版信息

Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae278.

DOI:10.1093/bioinformatics/btae278

PMID:38652603

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11093529/

Abstract

MOTIVATION

Antibody therapeutic candidates must exhibit not only tight binding to their target but also good developability properties, especially low risk of immunogenicity.

RESULTS

In this work, we fit a simple generative model, SAM, to sixty million human heavy and seventy million human light chains. We show that the probability of a sequence calculated by the model distinguishes human sequences from other species with the same or better accuracy on a variety of benchmark datasets containing >400 million sequences than any other model in the literature, outperforming large language models (LLMs) by large margins. SAM can humanize sequences, generate new sequences, and score sequences for humanness. It is both fast and fully interpretable. Our results highlight the importance of using simple models as baselines for protein engineering tasks. We additionally introduce a new tool for numbering antibody sequences which is orders of magnitude faster than existing tools in the literature.

AVAILABILITY AND IMPLEMENTATION

All tools developed in this study are available at https://github.com/Wang-lab-UCSD/AntPack.

摘要

动机

抗体治疗候选物不仅必须表现出与靶标的紧密结合，还必须具有良好的可开发性，特别是低免疫原性风险。

结果

在这项工作中，我们拟合了一个简单的生成模型 SAM，它适用于六千万个人类重链和七千万个人类轻链。我们表明，该模型计算的序列的概率在各种包含超过四亿个序列的基准数据集上，比文献中的任何其他模型都能更准确地区分人类序列和其他具有相同或更好同源性的物种序列，并且大大优于大型语言模型 (LLM)。SAM 可以对序列进行人源化、生成新序列和对序列进行人源化评分。它速度快且完全可解释。我们的结果强调了使用简单模型作为蛋白质工程任务基准的重要性。我们还引入了一种新的抗体序列编号工具，其速度比文献中的现有工具快几个数量级。

可用性和实现

本研究中开发的所有工具均可在 https://github.com/Wang-lab-UCSD/AntPack 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/697e/11093529/d355cacf1aaa/btae278f1.jpg

相似文献

For antibody sequence generative modeling, mixture models may be all you need.

Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae278.

Sexual Harassment and Prevention Training

Antidepressants for pain management in adults with chronic pain: a network meta-analysis.

Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Immunogenicity and seroefficacy of pneumococcal conjugate vaccines: a systematic review and network meta-analysis.

Health Technol Assess. 2024 Jul;28(34):1-109. doi: 10.3310/YWHA3079.

Systemic treatments for metastatic cutaneous melanoma.

Cochrane Database Syst Rev. 2018 Feb 6;2(2):CD011123. doi: 10.1002/14651858.CD011123.pub2.

The Black Book of Psychotropic Dosing and Monitoring.

Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.

A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.

Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Systemic Inflammatory Response Syndrome

引用本文的文献

Pioneer: a synthetic human antibody phage display library for rapid therapeutic lead generation.

MAbs. 2025 Dec;17(1):2543769. doi: 10.1080/19420862.2025.2543769. Epub 2025 Aug 14.

RIOT-Rapid Immunoglobulin Overview Tool-annotation of nucleotide and amino acid immunoglobulin sequences using an open germline database.

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae632.

Seq2scFv: a toolkit for the comprehensive analysis of display libraries from long-read sequencing platforms.

MAbs. 2024 Jan-Dec;16(1):2408344. doi: 10.1080/19420862.2024.2408344. Epub 2024 Oct 8.

RESP2: An uncertainty aware multi-target multi-property optimization AI pipeline for antibody discovery.

bioRxiv. 2025 Mar 9:2024.07.30.605700. doi: 10.1101/2024.07.30.605700.

本文引用的文献

ProGen2: Exploring the boundaries of protein language models.

Cell Syst. 2023 Nov 15;14(11):968-978.e3. doi: 10.1016/j.cels.2023.10.002. Epub 2023 Oct 30.

IgLM: Infilling language modeling for antibody sequence design.

Cell Syst. 2023 Nov 15;14(11):979-989.e4. doi: 10.1016/j.cels.2023.10.001. Epub 2023 Oct 30.

Linear-Scaling Kernels for Protein Sequences and Small Molecules Outperform Deep Learning While Providing Uncertainty Quantitation and Improved Interpretability.

J Chem Inf Model. 2023 Aug 14;63(15):4589-4601. doi: 10.1021/acs.jcim.3c00601. Epub 2023 Jul 27.

The RESP AI model accelerates the identification of tight-binding antibodies.

Nat Commun. 2023 Jan 28;14(1):454. doi: 10.1038/s41467-023-36028-8.

BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning.

MAbs. 2022 Jan-Dec;14(1):2020203. doi: 10.1080/19420862.2021.2020203.

Observed Antibody Space: A diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences.

Protein Sci. 2022 Jan;31(1):141-146. doi: 10.1002/pro.4205. Epub 2021 Oct 29.

Humanization of antibodies using a machine learning approach on large-scale repertoire data.

Bioinformatics. 2021 Nov 18;37(22):4041-4047. doi: 10.1093/bioinformatics/btab434.

Predicting Antibody Developability Profiles Through Early Stage Discovery Screening.

MAbs. 2020 Jan-Dec;12(1):1743053. doi: 10.1080/19420862.2020.1743053.

immuneSIM: tunable multi-feature simulation of B- and T-cell receptor repertoires for immunoinformatics benchmarking.

Bioinformatics. 2020 Jun 1;36(11):3594-3596. doi: 10.1093/bioinformatics/btaa158.

cAb-Rep: A Database of Curated Antibody Repertoires for Exploring Antibody Diversity and Predicting Antibody Prevalence.

Front Immunol. 2019 Oct 9;10:2365. doi: 10.3389/fimmu.2019.02365. eCollection 2019.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

对于抗体序列生成建模，混合模型可能就是你所需要的。

For antibody sequence generative modeling, mixture models may be all you need.

机构信息

Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA 92093-0359, United States.

MAP Bioscience, La Jolla, CA 92093, United States.