Suppr超能文献

基于机器学习的抗体设计无约束尺度的计算机原理证明。

In silico proof of principle of machine learning-based antibody design at unconstrained scale.

机构信息

Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway.

Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.

出版信息

MAbs. 2022 Jan-Dec;14(1):2031482. doi: 10.1080/19420862.2022.2031482.

Abstract

Generative machine learning (ML) has been postulated to become a major driver in the computational design of antigen-specific monoclonal antibodies (mAb). However, efforts to confirm this hypothesis have been hindered by the infeasibility of testing arbitrarily large numbers of antibody sequences for their most critical design parameters: paratope, epitope, affinity, and developability. To address this challenge, we leveraged a lattice-based antibody-antigen binding simulation framework, which incorporates a wide range of physiological antibody-binding parameters. The simulation framework enables the computation of synthetic antibody-antigen 3D-structures, and it functions as an oracle for unrestricted prospective evaluation and benchmarking of antibody design parameters of ML-generated antibody sequences. We found that a deep generative model, trained exclusively on antibody sequence (one dimensional: 1D) data can be used to design conformational (three dimensional: 3D) epitope-specific antibodies, matching, or exceeding the training dataset in affinity and developability parameter value variety. Furthermore, we established a lower threshold of sequence diversity necessary for high-accuracy generative antibody ML and demonstrated that this lower threshold also holds on experimental real-world data. Finally, we show that transfer learning enables the generation of high-affinity antibody sequences from low-N training data. Our work establishes a priori feasibility and the theoretical foundation of high-throughput ML-based mAb design.

摘要

生成式机器学习(ML)被认为是抗原特异性单克隆抗体(mAb)计算设计的主要驱动力。然而,由于无法测试任意数量的抗体序列的最关键设计参数:表位、抗原结合亲和力和可开发性,验证这一假设的努力受到了阻碍。为了解决这一挑战,我们利用了基于格点的抗体-抗原结合模拟框架,该框架结合了广泛的生理抗体结合参数。该模拟框架能够计算合成的抗体-抗原 3D 结构,并作为不受限制的前瞻性评估和基准测试 ML 生成的抗体序列的抗体设计参数的工具。我们发现,仅基于抗体序列(一维:1D)数据训练的深度生成模型可用于设计构象(三维:3D)表位特异性抗体,其亲和力和可开发性参数值的多样性与训练数据集相匹配或超过。此外,我们确定了高精度生成性抗体 ML 所需的序列多样性的下限,并证明该下限在实验真实世界数据上同样适用。最后,我们表明,迁移学习能够从低 N 训练数据生成高亲和力的抗体序列。我们的工作建立了基于 ML 的高通量 mAb 设计的先验可行性和理论基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee3c/8986205/b9bf2b5c2412/KMAB_A_2031482_F0001_OC.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验