Cho Young-Rae, Zhang Aidong
Department of Computer Science, Baylor University,Waco, TX 76798, USA.
IEEE Trans Inf Technol Biomed. 2010 Jan;14(1):30-6. doi: 10.1109/TITB.2009.2028234. Epub 2009 Sep 1.
Predicting protein function from protein interaction networks has been challenging because of the complexity of functional relationships among proteins. Most previous function prediction methods depend on the neighborhood of or the connected paths to known proteins. However, their accuracy has been limited due to the functional inconsistency of interacting proteins. In this paper, we propose a novel approach for function prediction by identifying frequent patterns of functional associations in a protein interaction network. A set of functions that a protein performs is assigned into the corresponding node as a label. A functional association pattern is then represented as a labeled subgraph. Our frequent labeled subgraph mining algorithm efficiently searches the functional association patterns that occur frequently in the network. It iteratively increases the size of frequent patterns by one node at a time by selective joining, and simplifies the network by a priori pruning. Using the yeast protein interaction network, our algorithm found more than 1400 frequent functional association patterns. The function prediction is performed by matching the subgraph, including the unknown protein, with the frequent patterns analogous to it. By leave-one-out cross validation, we show that our approach has better performance than previous link-based methods in terms of prediction accuracy. The frequent functional association patterns generated in this study might become the foundations of advanced analysis for functional behaviors of proteins in a system level.
由于蛋白质之间功能关系的复杂性,从蛋白质相互作用网络预测蛋白质功能一直具有挑战性。大多数先前的功能预测方法依赖于已知蛋白质的邻域或连接路径。然而,由于相互作用蛋白质的功能不一致,它们的准确性受到限制。在本文中,我们提出了一种通过识别蛋白质相互作用网络中功能关联的频繁模式来进行功能预测的新方法。将蛋白质执行的一组功能作为标签分配给相应的节点。然后将功能关联模式表示为带标签的子图。我们的频繁带标签子图挖掘算法有效地搜索网络中频繁出现的功能关联模式。它通过选择性连接一次将频繁模式的大小增加一个节点,并通过先验剪枝简化网络。使用酵母蛋白质相互作用网络,我们的算法发现了1400多个频繁的功能关联模式。通过将包括未知蛋白质的子图与类似的频繁模式进行匹配来进行功能预测。通过留一法交叉验证,我们表明我们的方法在预测准确性方面比以前基于链接的方法具有更好的性能。本研究中生成的频繁功能关联模式可能成为系统水平上蛋白质功能行为高级分析的基础。