Park Hyeon Ah, Kim Taewook, Li Meijing, Shon Ho Sun, Park Jeong Seok, Ryu Keun Ho
Database/Bioinformatics Laboratory, College of Electrical and Computer Engineering Chungbuk National University, Cheongju, Korea.
Syntekabio Incorporated, Korea Institute of Science and Technology, Seoul, Korea.
Osong Public Health Res Perspect. 2015 Apr;6(2):112-20. doi: 10.1016/j.phrp.2015.01.006. Epub 2015 Feb 24.
Predicting protein function from the protein-protein interaction network is challenging due to its complexity and huge scale of protein interaction process along with inconsistent pattern. Previously proposed methods such as neighbor counting, network analysis, and graph pattern mining has predicted functions by calculating the rules and probability of patterns inside network. Although these methods have shown good prediction, difficulty still exists in searching several functions that are exceptional from simple rules and patterns as a result of not considering the inconsistent aspect of the interaction network.
In this article, we propose a novel approach using the sequential pattern mining method with gap-constraints. To overcome the inconsistency problem, we suggest frequent functional patterns to include every possible functional sequence-including patterns for which search is limited by the structure of connection or level of neighborhood layer. We also constructed a tree-graph with the most crucial interaction information of the target protein, and generated candidate sets to assign by sequential pattern mining allowing gaps.
The parameters of pattern length, maximum gaps, and minimum support were given to find the best setting for the most accurate prediction. The highest accuracy rate was 0.972, which showed better results than the simple neighbor counting approach and link-based approach.
The results comparison with other approaches has confirmed that the proposed approach could reach more function candidates that previous methods could not obtain.
由于蛋白质 - 蛋白质相互作用网络的复杂性、蛋白质相互作用过程的巨大规模以及模式的不一致性,从该网络预测蛋白质功能具有挑战性。先前提出的方法,如邻居计数、网络分析和图模式挖掘,通过计算网络内部模式的规则和概率来预测功能。尽管这些方法已显示出良好的预测效果,但由于未考虑相互作用网络的不一致方面,在搜索一些不符合简单规则和模式的特殊功能时仍存在困难。
在本文中,我们提出了一种使用带间隙约束的序列模式挖掘方法的新颖途径。为了克服不一致问题,我们建议频繁功能模式应包含每一个可能的功能序列,包括那些因连接结构或邻域层级别而搜索受限的模式。我们还构建了一个包含目标蛋白质最关键相互作用信息的树状图,并通过允许间隙的序列模式挖掘生成待分配的候选集。
给出了模式长度、最大间隙和最小支持度等参数,以找到最准确预测的最佳设置。最高准确率为0.972,比简单的邻居计数方法和基于链接的方法显示出更好的结果。
与其他方法的结果比较证实,所提出的方法能够找到更多先前方法无法获得的功能候选物。