Arras Paul, Yoo Han Byul, Pekar Lukas, Clarke Thomas, Friedrich Lukas, Schröter Christian, Schanz Jennifer, Tonillo Jason, Siegmund Vanessa, Doerner Achim, Krah Simon, Guarnera Enrico, Zielonka Stefan, Evers Andreas
Antibody Discovery and Protein Engineering, Merck Healthcare KGaA, Darmstadt, Germany.
Institute for Organic Chemistry and Biochemistry, Technical University of Darmstadt, Darmstadt, Germany.
Front Mol Biosci. 2023 Sep 28;10:1249247. doi: 10.3389/fmolb.2023.1249247. eCollection 2023.
In this study, we demonstrate the feasibility of yeast surface display (YSD) and nextgeneration sequencing (NGS) in combination with artificial intelligence and machine learning methods (AI/ML) for the identification of de novo humanized single domain antibodies (sdAbs) with favorable early developability profiles. The display library was derived from a novel approach, in which VHH-based CDR3 regions obtained from a llama (Lama glama), immunized against NKp46, were grafted onto a humanized VHH backbone library that was diversified in CDR1 and CDR2. Following NGS analysis of sequence pools from two rounds of fluorescence-activated cell sorting we focused on four sequence clusters based on NGS frequency and enrichment analysis as well as in silico developability assessment. For each cluster, long short-term memory (LSTM) based deep generative models were trained and used for the in silico sampling of new sequences. Sequences were subjected to sequence- and structure-based in silico developability assessment to select a set of less than 10 sequences per cluster for production. As demonstrated by binding kinetics and early developability assessment, this procedure represents a general strategy for the rapid and efficient design of potent and automatically humanized sdAb hits from screening selections with favorable early developability profiles.
在本研究中,我们证明了酵母表面展示(YSD)和下一代测序(NGS)与人工智能和机器学习方法(AI/ML)相结合,用于鉴定具有良好早期开发特性的全新人源化单域抗体(sdAbs)的可行性。展示文库源自一种新方法,即从免疫NKp46的羊驼(小羊驼)获得的基于VHH的互补决定区3(CDR3)区域,嫁接到在CDR1和CDR2中多样化的人源化VHH骨架文库上。在对两轮荧光激活细胞分选的序列库进行NGS分析后,我们基于NGS频率和富集分析以及计算机可开发性评估,聚焦于四个序列簇。对于每个簇,训练基于长短期记忆(LSTM)的深度生成模型,并用于新序列的计算机采样。对序列进行基于序列和结构的计算机可开发性评估,以从每个簇中选择少于10个序列进行生产。如结合动力学和早期开发性评估所示,该程序代表了一种通用策略,可从具有良好早期开发特性的筛选选择中快速有效地设计出强效且自动人源化的sdAb命中物。