Department of Biochemistry and Structural Biology, Instituto de Fisiologia Celular, UNAM, Mexico City 04510, Mexico.
C3 consensus, Leon Guanajuato 37266, Mexico.
Int J Mol Sci. 2020 Jul 6;21(13):4787. doi: 10.3390/ijms21134787.
Predicting protein-protein interactions (PPI) represents an important challenge in structural bioinformatics. Current computational methods display different degrees of accuracy when predicting these interactions. Different factors were proposed to help improve these predictions, including choosing the proper descriptors of proteins to represent these interactions, among others. In the current work, we provide a representative protein structure that is amenable to PPI classification using machine learning approaches, referred to as residue cluster classes. Through sampling and optimization, we identified the best algorithm-parameter pair to classify PPI from more than 360 different training sets. We tested these classifiers against PPI datasets that were not included in the training set but shared sequence similarity with proteins in the training set to reproduce the situation of most proteins sharing sequence similarity with others. We identified a model with almost no PPI error (96-99% of correctly classified instances) and showed that residue cluster classes of protein pairs displayed a distinct pattern between positive and negative protein interactions. Our results indicated that residue cluster classes are structural features relevant to model PPI and provide a novel tool to mathematically model the protein structure/function relationship.
预测蛋白质-蛋白质相互作用(PPI)是结构生物信息学中的一个重要挑战。目前的计算方法在预测这些相互作用时表现出不同程度的准确性。提出了不同的因素来帮助提高这些预测的准确性,包括选择合适的蛋白质描述符来表示这些相互作用等。在当前的工作中,我们提供了一种可用于 PPI 分类的代表性蛋白质结构,称为残基簇类。通过采样和优化,我们确定了最佳的算法-参数对,以对来自 360 多个不同训练集的 PPI 进行分类。我们使用未包含在训练集中但与训练集中蛋白质具有序列相似性的 PPI 数据集来测试这些分类器,以重现大多数蛋白质与其他蛋白质具有序列相似性的情况。我们确定了一个几乎没有 PPI 错误(96-99%的正确分类实例)的模型,并表明蛋白质对之间的残基簇类显示了阳性和阴性蛋白质相互作用之间的明显模式。我们的结果表明,残基簇类是与模型 PPI 相关的结构特征,并为数学建模蛋白质结构/功能关系提供了一种新工具。