Qiu Zhijun, Zhou Bo, Yuan Jiangfeng
College of Food and Bioengineering, Henan University of Science and Technology, 263 Kai-Yuan Road, Luoyang, 471023, China.
College of Food and Bioengineering, Henan University of Science and Technology, 263 Kai-Yuan Road, Luoyang, 471023, China.
J Theor Biol. 2017 Nov 21;433:57-63. doi: 10.1016/j.jtbi.2017.08.026. Epub 2017 Sep 1.
Protein-protein interaction site (PPIS) prediction must deal with the diversity of interaction sites that limits their prediction accuracy. Use of proteins with unknown or unidentified interactions can also lead to missing interfaces. Such data errors are often brought into the training dataset. In response to these two problems, we used the minimum covariance determinant (MCD) method to refine the training data to build a predictor with better performance, utilizing its ability of removing outliers. In order to predict test data in practice, a method based on Mahalanobis distance was devised to select proper test data as input for the predictor. With leave-one-validation and independent test, after the Mahalanobis distance screening, our method achieved higher performance according to Matthews correlation coefficient (MCC), although only a part of test data could be predicted. These results indicate that data refinement is an efficient approach to improve protein-protein interaction site prediction. By further optimizing our method, it is hopeful to develop predictors of better performance and wide range of application.
蛋白质-蛋白质相互作用位点(PPIS)预测必须应对相互作用位点的多样性,这种多样性限制了其预测准确性。使用具有未知或未识别相互作用的蛋白质也可能导致遗漏界面。此类数据错误经常被带入训练数据集。针对这两个问题,我们使用最小协方差行列式(MCD)方法对训练数据进行细化,以构建性能更好的预测器,利用其去除异常值的能力。为了在实际中预测测试数据,设计了一种基于马氏距离的方法来选择合适的测试数据作为预测器的输入。通过留一法验证和独立测试,经过马氏距离筛选后,尽管只能预测一部分测试数据,但我们的方法根据马修斯相关系数(MCC)取得了更高的性能。这些结果表明,数据细化是提高蛋白质-蛋白质相互作用位点预测的有效方法。通过进一步优化我们的方法,有望开发出性能更好、应用范围更广的预测器。