Richard A Clay, Pantazes Robert J
Department of Chemical Engineering, Auburn University, Auburn, Alabama, USA.
Proteins. 2025 Apr;93(4):812-830. doi: 10.1002/prot.26773. Epub 2024 Nov 27.
The last few years have seen the rapid proliferation of machine learning methods to design binding proteins. Although these methods have shown large increases in experimental success rates compared to prior approaches, the majority of their predictions fail when they are experimentally tested. It is evident that computational methods still struggle to distinguish the features of real protein binding interfaces from false predictions. Short molecular dynamics simulations of 20 antibody-protein complexes were conducted to identify features of interactions that should occur in binding interfaces. Intermolecular salt bridges, hydrogen bonds, and hydrophobic interactions were evaluated for their persistences, energies, and stabilities during the simulations. It was found that only the hydrogen bonds where both residues are stabilized in the bound complex are expected to persist and meaningfully contribute to binding between the proteins. In contrast, stabilization was not a requirement for salt bridges and hydrophobic interactions to persist. Still, interactions where both residues are stabilized in the bound complex persist significantly longer and have significantly stronger energies than other interactions. Two hundred and twenty real antibody-protein complexes and 8194 decoy complexes were used to train and test a random forest classifier using the features of expected persistent interactions identified in this study and the macromolecular features of interaction energy (IE), buried surface area (BSA), IE/BSA, and shape complementarity. It was compared to a classifier trained only on the expected persistent interaction features and another trained only on the macromolecular features. Inclusion of the expected persistent interaction features reduced the false positive rate of the classifier by two- to five-fold across a range of true positive classification rates.
在过去几年中,用于设计结合蛋白的机器学习方法迅速增多。尽管与先前的方法相比,这些方法在实验成功率上有了大幅提高,但大多数预测在实验测试时仍会失败。显然,计算方法仍难以从错误预测中区分出真实蛋白质结合界面的特征。我们对20个抗体 - 蛋白质复合物进行了短分子动力学模拟,以识别结合界面中应该出现的相互作用特征。在模拟过程中,对分子间盐桥、氢键和疏水相互作用的持久性、能量和稳定性进行了评估。结果发现,只有在结合复合物中两个残基都稳定的氢键才有望持续存在并对蛋白质之间的结合有显著贡献。相比之下,盐桥和疏水相互作用的持续存在并不需要稳定性。尽管如此,两个残基在结合复合物中都稳定的相互作用比其他相互作用持续的时间长得多,能量也显著更强。利用本研究中确定的预期持续相互作用特征以及相互作用能量(IE)、埋藏表面积(BSA)、IE/BSA和形状互补性等大分子特征,使用220个真实抗体 - 蛋白质复合物和8194个诱饵复合物来训练和测试随机森林分类器。将其与仅基于预期持续相互作用特征训练的分类器以及仅基于大分子特征训练的另一个分类器进行比较。在一系列真阳性分类率范围内,纳入预期持续相互作用特征可将分类器的假阳性率降低两到五倍。