蛋白质重复性对使用支持向量机进行蛋白质-蛋白质相互作用预测结果的影响

Effect of Protein Repetitiveness on Protein-Protein Interaction Prediction Results Using Support Vector Machines.

作者信息

Zhou Jie

机构信息

Guangdong Province Key Laboratory of Computer Network, School of Computer Science and Engineering, South China University of Technology , Guangzhou, China .

出版信息

J Comput Biol. 2017 Feb;24(2):183-192. doi: 10.1089/cmb.2015.0233. Epub 2016 Aug 16.

DOI:10.1089/cmb.2015.0233

PMID:27529135

Abstract

BACKGROUND

There are many computational approaches to predict the protein-protein interactions using support vector machines (SVMs) with high performance. In fact, performance of currently reported methods are significantly over-estimated and affected by the object repetitiveness in the datasets used.

OBJECTIVE

To study the effect of object repetitiveness of datasets on predicting results.

METHOD

We present novel methods to construct different positive datasets with or without repeating proteins using graph maximum matching in the protein-protein interaction datasets and corresponding series of negative datasets with different proteins repetitiveness are constructed using graph adjacency matrix. The relationship between the SVM prediction results and the repeated proteins (repeat numbers and repeat rates) and the distributions of repeated proteins in the datasets are analyzed.

RESULTS

Protein repetitiveness of positive and negative datasets can affect the prediction result: high protein repetitiveness of positive or negative datasets yield high performance prediction result.

CONCLUSION

This indicate that dealing with object repetitiveness of datasets is a key issue in protein-protein interactions prediction using SVMs since real world data contain certain degrees of repeat proteins.

摘要