Jelínek Jan, Škoda Petr, Hoksza David
Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Ke Karlovu 3, Prague 2, Czech Republic.
BMC Bioinformatics. 2017 Dec 6;18(Suppl 15):492. doi: 10.1186/s12859-017-1921-4.
Protein-protein interactions (PPI) play a key role in an investigation of various biochemical processes, and their identification is thus of great importance. Although computational prediction of which amino acids take part in a PPI has been an active field of research for some time, the quality of in-silico methods is still far from perfect.
We have developed a novel prediction method called INSPiRE which benefits from a knowledge base built from data available in Protein Data Bank. All proteins involved in PPIs were converted into labeled graphs with nodes corresponding to amino acids and edges to pairs of neighboring amino acids. A structural neighborhood of each node was then encoded into a bit string and stored in the knowledge base. When predicting PPIs, INSPiRE labels amino acids of unknown proteins as interface or non-interface based on how often their structural neighborhood appears as interface or non-interface in the knowledge base. We evaluated INSPiRE's behavior with respect to different types and sizes of the structural neighborhood. Furthermore, we examined the suitability of several different features for labeling the nodes. Our evaluations showed that INSPiRE clearly outperforms existing methods with respect to Matthews correlation coefficient.
In this paper we introduce a new knowledge-based method for identification of protein-protein interaction sites called INSPiRE. Its knowledge base utilizes structural patterns of known interaction sites in the Protein Data Bank which are then used for PPI prediction. Extensive experiments on several well-established datasets show that INSPiRE significantly surpasses existing PPI approaches.
蛋白质-蛋白质相互作用(PPI)在各种生化过程的研究中起着关键作用,因此其识别非常重要。尽管一段时间以来,关于哪些氨基酸参与PPI的计算预测一直是一个活跃的研究领域,但计算机模拟方法的质量仍远非完美。
我们开发了一种名为INSPiRE的新型预测方法,该方法受益于从蛋白质数据库中可用数据构建的知识库。所有参与PPI的蛋白质都被转换为带标签的图,其中节点对应于氨基酸,边对应于相邻氨基酸对。然后将每个节点的结构邻域编码为位串并存储在知识库中。在预测PPI时,INSPiRE根据未知蛋白质的结构邻域在知识库中作为界面或非界面出现的频率,将其氨基酸标记为界面或非界面。我们评估了INSPiRE在不同类型和大小的结构邻域方面的表现。此外,我们研究了几种不同特征用于标记节点的适用性。我们的评估表明,就马修斯相关系数而言,INSPiRE明显优于现有方法。
在本文中,我们介绍了一种名为INSPiRE的基于新知识的蛋白质-蛋白质相互作用位点识别方法。其知识库利用了蛋白质数据库中已知相互作用位点的结构模式,然后将其用于PPI预测。在几个成熟数据集上进行的广泛实验表明,INSPiRE显著超越了现有的PPI方法。