IEEE/ACM Trans Comput Biol Bioinform. 2017 Sep-Oct;14(5):1165-1172. doi: 10.1109/TCBB.2017.2649529. Epub 2017 Jan 10.
Self interacting proteins (SIPs) play an important role in various aspects of the structural and functional organization of the cell. Detecting SIPs is one of the most important issues in current molecular biology. Although a large number of SIPs data has been generated by experimental methods, wet laboratory approaches are both time-consuming and costly. In addition, they yield high false negative and positive rates. Thus, there is a great need for in silico methods to predict SIPs accurately and efficiently. In this study, a new sequence-based method is proposed to predict SIPs. The evolutionary information contained in Position-Specific Scoring Matrix (PSSM) is extracted from of protein with known sequence. Then, features are fed to an ensemble classifier to distinguish the self-interacting and non-self-interacting proteins. When performed on Saccharomyces cerevisiae and Human SIPs data sets, the proposed method can achieve high accuracies of 86.86 and 91.30 percent, respectively. Our method also shows a good performance when compared with the SVM classifier and previous methods. Consequently, the proposed method can be considered to be a novel promising tool to predict SIPs.
自相互作用蛋白(SIPs)在细胞的结构和功能组织的各个方面起着重要作用。检测 SIPs 是当前分子生物学中最重要的问题之一。尽管已经通过实验方法生成了大量的 SIPs 数据,但湿实验室方法既耗时又昂贵。此外,它们还会产生高的假阴性和假阳性率。因此,非常需要准确高效地进行 SIPs 预测的计算方法。在这项研究中,提出了一种新的基于序列的方法来预测 SIPs。从具有已知序列的蛋白质中提取位置特异性评分矩阵(PSSM)中包含的进化信息。然后,特征被馈送到集成分类器中以区分自相互作用和非自相互作用的蛋白质。当在酿酒酵母和人类 SIPs 数据集上执行时,所提出的方法可以分别达到 86.86%和 91.30%的高精度。与 SVM 分类器和以前的方法相比,我们的方法也表现出了良好的性能。因此,所提出的方法可以被认为是一种预测 SIPs 的新型有前途的工具。