Bioinformatics & Genomics Graduate Program, Pennsylvania State University, University Park, PA 16802, USA.
Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, PA 16802, USA.
Biomolecules. 2023 Jan 6;13(1):121. doi: 10.3390/biom13010121.
Protein-protein interactions play a ubiquitous role in biological function. Knowledge of the three-dimensional (3D) structures of the complexes they form is essential for understanding the structural basis of those interactions and how they orchestrate key cellular processes. Computational docking has become an indispensable alternative to the expensive and time-consuming experimental approaches for determining the 3D structures of protein complexes. Despite recent progress, identifying near-native models from a large set of conformations sampled by docking-the so-called scoring problem-still has considerable room for improvement. We present MetaScore, a new machine-learning-based approach to improve the scoring of docked conformations. MetaScore utilizes a random forest (RF) classifier trained to distinguish near-native from non-native conformations using their protein-protein interfacial features. The features include physicochemical properties, energy terms, interaction-propensity-based features, geometric properties, interface topology features, evolutionary conservation, and also scores produced by traditional scoring functions (SFs). MetaScore scores docked conformations by simply averaging the score produced by the RF classifier with that produced by any traditional SF. We demonstrate that (i) MetaScore consistently outperforms each of the nine traditional SFs included in this work in terms of success rate and hit rate evaluated over conformations ranked among the top 10; (ii) an ensemble method, MetaScore-Ensemble, that combines 10 variants of MetaScore obtained by combining the RF score with each of the traditional SFs outperforms each of the MetaScore variants. We conclude that the performance of traditional SFs can be improved upon by using machine learning to judiciously leverage protein-protein interfacial features and by using ensemble methods to combine multiple scoring functions.
蛋白质-蛋白质相互作用在生物功能中起着普遍的作用。了解它们形成的复合物的三维(3D)结构对于理解这些相互作用的结构基础以及它们如何协调关键细胞过程是至关重要的。计算对接已成为确定蛋白质复合物 3D 结构的昂贵且耗时的实验方法的不可或缺的替代方法。尽管最近取得了进展,但从对接采样的大量构象中识别接近天然的模型-即所谓的评分问题-仍然有很大的改进空间。我们提出了 MetaScore,这是一种基于机器学习的新方法,可以提高对接构象的评分。MetaScore 利用随机森林(RF)分类器,该分类器经过训练,可以使用其蛋白质-蛋白质界面特征来区分近天然和非天然构象。特征包括物理化学性质、能量项、基于相互作用倾向的特征、几何性质、界面拓扑特征、进化保守性,以及传统评分函数(SFs)产生的分数。MetaScore 通过简单地将 RF 分类器产生的分数与任何传统 SF 产生的分数相加,对对接构象进行评分。我们证明:(i)MetaScore 在成功率和命中率方面始终优于本工作中包含的九个传统 SF 中的每一个,评估的构象排在前 10 名;(ii)一种集成方法,MetaScore-Ensemble,通过将 RF 分数与每个传统 SF 结合,组合了 10 种 MetaScore 变体,优于每种 MetaScore 变体。我们得出结论,通过使用机器学习明智地利用蛋白质-蛋白质界面特征,并使用集成方法结合多个评分函数,可以提高传统 SF 的性能。