Suppr超能文献

给定序列频繁模式挖掘的间隙约束在蛋白质功能预测中的应用。

Application of gap-constraints given sequential frequent pattern mining for protein function prediction.

作者信息

Park Hyeon Ah, Kim Taewook, Li Meijing, Shon Ho Sun, Park Jeong Seok, Ryu Keun Ho

机构信息

Database/Bioinformatics Laboratory, College of Electrical and Computer Engineering Chungbuk National University, Cheongju, Korea.

Syntekabio Incorporated, Korea Institute of Science and Technology, Seoul, Korea.

出版信息

Osong Public Health Res Perspect. 2015 Apr;6(2):112-20. doi: 10.1016/j.phrp.2015.01.006. Epub 2015 Feb 24.

Abstract

OBJECTIVES

Predicting protein function from the protein-protein interaction network is challenging due to its complexity and huge scale of protein interaction process along with inconsistent pattern. Previously proposed methods such as neighbor counting, network analysis, and graph pattern mining has predicted functions by calculating the rules and probability of patterns inside network. Although these methods have shown good prediction, difficulty still exists in searching several functions that are exceptional from simple rules and patterns as a result of not considering the inconsistent aspect of the interaction network.

METHODS

In this article, we propose a novel approach using the sequential pattern mining method with gap-constraints. To overcome the inconsistency problem, we suggest frequent functional patterns to include every possible functional sequence-including patterns for which search is limited by the structure of connection or level of neighborhood layer. We also constructed a tree-graph with the most crucial interaction information of the target protein, and generated candidate sets to assign by sequential pattern mining allowing gaps.

RESULTS

The parameters of pattern length, maximum gaps, and minimum support were given to find the best setting for the most accurate prediction. The highest accuracy rate was 0.972, which showed better results than the simple neighbor counting approach and link-based approach.

CONCLUSION

The results comparison with other approaches has confirmed that the proposed approach could reach more function candidates that previous methods could not obtain.

摘要

目标

由于蛋白质 - 蛋白质相互作用网络的复杂性、蛋白质相互作用过程的巨大规模以及模式的不一致性,从该网络预测蛋白质功能具有挑战性。先前提出的方法,如邻居计数、网络分析和图模式挖掘,通过计算网络内部模式的规则和概率来预测功能。尽管这些方法已显示出良好的预测效果,但由于未考虑相互作用网络的不一致方面,在搜索一些不符合简单规则和模式的特殊功能时仍存在困难。

方法

在本文中,我们提出了一种使用带间隙约束的序列模式挖掘方法的新颖途径。为了克服不一致问题,我们建议频繁功能模式应包含每一个可能的功能序列,包括那些因连接结构或邻域层级别而搜索受限的模式。我们还构建了一个包含目标蛋白质最关键相互作用信息的树状图,并通过允许间隙的序列模式挖掘生成待分配的候选集。

结果

给出了模式长度、最大间隙和最小支持度等参数,以找到最准确预测的最佳设置。最高准确率为0.972,比简单的邻居计数方法和基于链接的方法显示出更好的结果。

结论

与其他方法的结果比较证实,所提出的方法能够找到更多先前方法无法获得的功能候选物。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e94/4411351/e337a2d79b26/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验