Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway.
Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.
MAbs. 2022 Jan-Dec;14(1):2031482. doi: 10.1080/19420862.2022.2031482.
Generative machine learning (ML) has been postulated to become a major driver in the computational design of antigen-specific monoclonal antibodies (mAb). However, efforts to confirm this hypothesis have been hindered by the infeasibility of testing arbitrarily large numbers of antibody sequences for their most critical design parameters: paratope, epitope, affinity, and developability. To address this challenge, we leveraged a lattice-based antibody-antigen binding simulation framework, which incorporates a wide range of physiological antibody-binding parameters. The simulation framework enables the computation of synthetic antibody-antigen 3D-structures, and it functions as an oracle for unrestricted prospective evaluation and benchmarking of antibody design parameters of ML-generated antibody sequences. We found that a deep generative model, trained exclusively on antibody sequence (one dimensional: 1D) data can be used to design conformational (three dimensional: 3D) epitope-specific antibodies, matching, or exceeding the training dataset in affinity and developability parameter value variety. Furthermore, we established a lower threshold of sequence diversity necessary for high-accuracy generative antibody ML and demonstrated that this lower threshold also holds on experimental real-world data. Finally, we show that transfer learning enables the generation of high-affinity antibody sequences from low-N training data. Our work establishes a priori feasibility and the theoretical foundation of high-throughput ML-based mAb design.
生成式机器学习(ML)被认为是抗原特异性单克隆抗体(mAb)计算设计的主要驱动力。然而,由于无法测试任意数量的抗体序列的最关键设计参数:表位、抗原结合亲和力和可开发性,验证这一假设的努力受到了阻碍。为了解决这一挑战,我们利用了基于格点的抗体-抗原结合模拟框架,该框架结合了广泛的生理抗体结合参数。该模拟框架能够计算合成的抗体-抗原 3D 结构,并作为不受限制的前瞻性评估和基准测试 ML 生成的抗体序列的抗体设计参数的工具。我们发现,仅基于抗体序列(一维:1D)数据训练的深度生成模型可用于设计构象(三维:3D)表位特异性抗体,其亲和力和可开发性参数值的多样性与训练数据集相匹配或超过。此外,我们确定了高精度生成性抗体 ML 所需的序列多样性的下限,并证明该下限在实验真实世界数据上同样适用。最后,我们表明,迁移学习能够从低 N 训练数据生成高亲和力的抗体序列。我们的工作建立了基于 ML 的高通量 mAb 设计的先验可行性和理论基础。