Molecular and Cell Biology Graduate Program, Dartmouth College, Hanover, New Hampshire, USA.
Thayer School of Engineering, Dartmouth College, Hanover, New Hampshire, USA.
Protein Sci. 2024 Sep;33(9):e5127. doi: 10.1002/pro.5127.
The ability to accurately predict antibody-antigen complex structures from their sequences could greatly advance our understanding of the immune system and would aid in the development of novel antibody therapeutics. There have been considerable recent advancements in predicting protein-protein interactions (PPIs) fueled by progress in machine learning (ML). To understand the current state of the field, we compare six representative methods for predicting antibody-antigen complexes from sequence, including two deep learning approaches trained to predict PPIs in general (AlphaFold-Multimer and RoseTTAFold), two composite methods that initially predict antibody and antigen structures separately and dock them (using antibody-mode ClusPro), local refinement in Rosetta (SnugDock) of globally docked poses from ClusPro, and a pipeline combining homology modeling with rigid-body docking informed by ML-based epitope and paratope prediction (AbAdapt). We find that AlphaFold-Multimer outperformed other methods, although the absolute performance leaves considerable room for improvement. AlphaFold-Multimer models of lower quality display significant structural biases at the level of tertiary motifs (TERMs) toward having fewer structural matches in non-antibody-containing structures from the Protein Data Bank (PDB). Specifically, better models exhibit more common PDB-like TERMs at the antibody-antigen interface than worse ones. Importantly, the clear relationship between performance and the commonness of interfacial TERMs suggests that the scarcity of interfacial geometry data in the structural database may currently limit the application of ML to the prediction of antibody-antigen interactions.
从序列准确预测抗体-抗原复合物结构的能力可以极大地促进我们对免疫系统的理解,并有助于开发新型抗体疗法。近年来,由于机器学习 (ML) 的进步,预测蛋白质-蛋白质相互作用 (PPI) 的能力取得了相当大的进展。为了了解该领域的现状,我们比较了六种从序列预测抗体-抗原复合物的代表性方法,包括两种经过训练可预测一般 PPI 的深度学习方法(AlphaFold-Multimer 和 RoseTTAFold)、两种最初分别预测抗体和抗原结构然后对接它们的组合方法(使用抗体模式 ClusPro)、Rosetta 中的局部精修(SnugDock)从 ClusPro 对接的全局构象、以及一种结合同源建模与刚体对接的管道,该管道由基于 ML 的表位和抗体互补决定区预测提供信息(AbAdapt)。我们发现 AlphaFold-Multimer 优于其他方法,尽管绝对性能仍有很大的改进空间。质量较低的 AlphaFold-Multimer 模型在三级基序(TERMs)水平上表现出明显的结构偏差,即与来自蛋白质数据库 (PDB) 的非抗体结构相比,具有较少的结构匹配。具体来说,更好的模型在抗体-抗原界面处表现出比较差的模型更多的常见 PDB 样 TERMs。重要的是,性能与界面 TERMs 的常见性之间的明确关系表明,结构数据库中界面几何数据的稀缺性可能会限制 ML 在抗体-抗原相互作用预测中的应用。