Chen Pao-Yang, Deane Charlotte M, Reinert Gesine
Department of Statistics, University of Oxford, Oxford, OX1 3TG, UK.
Bioinformatics. 2007 Sep 1;23(17):2314-21. doi: 10.1093/bioinformatics/btm342. Epub 2007 Jun 28.
The Majority Vote approach has demonstrated that protein-protein interactions can be used to predict the structure or function of a protein. In this article we propose a novel method for the prediction of such protein characteristics based on frequencies of pairwise interactions. In addition, we study a second new approach using the pattern frequencies of triplets of proteins, thus for the first time taking network structure explicitly into account. Both these methods are extended to jointly consider multiple organisms and multiple characteristics.
Compared to the standard non-network-based method, namely the Majority Vote method, in large networks our predictions tend to be more accurate. For structure prediction, the Frequency-based method reaches up to 71% accuracy, and the Triplet-based method reaches up to 72% accuracy, whereas for function prediction, both the Triplet-based method and the Frequency-based method reach up to 90% accuracy. Function prediction on proteins without homologues showed slightly less but comparable accuracies. Including partially annotated proteins substantially increases the number of proteins for which our methods predict their characteristics with reasonable accuracy. We find that the enhanced Triplet-based method does not currently yield significantly better results than the enhanced Frequency-based method, suggesting that triplets of interactions do not contain substantially more information about protein characteristics than interaction pairs. Our methods offer two main improvements over current approaches--first, multiple protein characteristics are considered simultaneously, and second, data is integrated from multiple species. In addition, the Triplet-based method includes network structure more explicitly than the Majority Vote and the Frequency-based method.
The program is available upon request.
Supplementary data are available at Bioinformatics online.
多数投票法已证明蛋白质-蛋白质相互作用可用于预测蛋白质的结构或功能。在本文中,我们提出了一种基于成对相互作用频率预测此类蛋白质特征的新方法。此外,我们研究了另一种使用蛋白质三联体模式频率的新方法,从而首次明确考虑了网络结构。这两种方法都被扩展到联合考虑多种生物体和多种特征。
与标准的非基于网络的方法(即多数投票法)相比,在大型网络中我们的预测往往更准确。对于结构预测,基于频率的方法准确率高达71%,基于三联体的方法准确率高达72%;而对于功能预测,基于三联体的方法和基于频率的方法准确率均高达90%。对无同源物的蛋白质进行功能预测时,准确率略低但相当。纳入部分注释的蛋白质显著增加了我们的方法能够以合理准确率预测其特征的蛋白质数量。我们发现,目前增强后的基于三联体的方法并未比增强后的基于频率的方法产生明显更好的结果,这表明相互作用三联体所包含的关于蛋白质特征的信息并不比相互作用对多得多。我们的方法相对于当前方法有两个主要改进——第一,同时考虑了多种蛋白质特征;第二,整合了来自多个物种的数据。此外,基于三联体的方法比多数投票法和基于频率的方法更明确地纳入了网络结构。
可应要求提供该程序。
补充数据可在《生物信息学》在线获取。