给定序列频繁模式挖掘的间隙约束在蛋白质功能预测中的应用。

Application of gap-constraints given sequential frequent pattern mining for protein function prediction.

作者信息

Park Hyeon Ah, Kim Taewook, Li Meijing, Shon Ho Sun, Park Jeong Seok, Ryu Keun Ho

机构信息

Database/Bioinformatics Laboratory, College of Electrical and Computer Engineering Chungbuk National University, Cheongju, Korea.

Syntekabio Incorporated, Korea Institute of Science and Technology, Seoul, Korea.

出版信息

Osong Public Health Res Perspect. 2015 Apr;6(2):112-20. doi: 10.1016/j.phrp.2015.01.006. Epub 2015 Feb 24.

DOI:10.1016/j.phrp.2015.01.006

PMID:25938021

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4411351/

Abstract

OBJECTIVES

Predicting protein function from the protein-protein interaction network is challenging due to its complexity and huge scale of protein interaction process along with inconsistent pattern. Previously proposed methods such as neighbor counting, network analysis, and graph pattern mining has predicted functions by calculating the rules and probability of patterns inside network. Although these methods have shown good prediction, difficulty still exists in searching several functions that are exceptional from simple rules and patterns as a result of not considering the inconsistent aspect of the interaction network.

METHODS

In this article, we propose a novel approach using the sequential pattern mining method with gap-constraints. To overcome the inconsistency problem, we suggest frequent functional patterns to include every possible functional sequence-including patterns for which search is limited by the structure of connection or level of neighborhood layer. We also constructed a tree-graph with the most crucial interaction information of the target protein, and generated candidate sets to assign by sequential pattern mining allowing gaps.

RESULTS

The parameters of pattern length, maximum gaps, and minimum support were given to find the best setting for the most accurate prediction. The highest accuracy rate was 0.972, which showed better results than the simple neighbor counting approach and link-based approach.

CONCLUSION

The results comparison with other approaches has confirmed that the proposed approach could reach more function candidates that previous methods could not obtain.

摘要

目标

由于蛋白质 - 蛋白质相互作用网络的复杂性、蛋白质相互作用过程的巨大规模以及模式的不一致性，从该网络预测蛋白质功能具有挑战性。先前提出的方法，如邻居计数、网络分析和图模式挖掘，通过计算网络内部模式的规则和概率来预测功能。尽管这些方法已显示出良好的预测效果，但由于未考虑相互作用网络的不一致方面，在搜索一些不符合简单规则和模式的特殊功能时仍存在困难。

方法

在本文中，我们提出了一种使用带间隙约束的序列模式挖掘方法的新颖途径。为了克服不一致问题，我们建议频繁功能模式应包含每一个可能的功能序列，包括那些因连接结构或邻域层级别而搜索受限的模式。我们还构建了一个包含目标蛋白质最关键相互作用信息的树状图，并通过允许间隙的序列模式挖掘生成待分配的候选集。

结果

给出了模式长度、最大间隙和最小支持度等参数，以找到最准确预测的最佳设置。最高准确率为0.972，比简单的邻居计数方法和基于链接的方法显示出更好的结果。

结论

与其他方法的结果比较证实，所提出的方法能够找到更多先前方法无法获得的功能候选物。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e94/4411351/e337a2d79b26/gr1.jpg

相似文献

Application of gap-constraints given sequential frequent pattern mining for protein function prediction.

Osong Public Health Res Perspect. 2015 Apr;6(2):112-20. doi: 10.1016/j.phrp.2015.01.006. Epub 2015 Feb 24.

Predicting protein function by frequent functional association pattern mining in protein interaction networks.

IEEE Trans Inf Technol Biomed. 2010 Jan;14(1):30-6. doi: 10.1109/TITB.2009.2028234. Epub 2009 Sep 1.

NetNMSP: Nonoverlapping maximal sequential pattern mining.

Appl Intell (Dordr). 2022;52(9):9861-9884. doi: 10.1007/s10489-021-02912-3. Epub 2022 Jan 10.

An Efficient Incremental Mining Algorithm for Discovering Sequential Pattern in Wireless Sensor Network Environments.

Sensors (Basel). 2018 Dec 21;19(1):29. doi: 10.3390/s19010029.

WildSpan: mining structured motifs from protein sequences.

Algorithms Mol Biol. 2011 Mar 31;6(1):6. doi: 10.1186/1748-7188-6-6.

NetNCSP: Nonoverlapping closed sequential pattern mining.

Knowl Based Syst. 2020 May 21;196:105812. doi: 10.1016/j.knosys.2020.105812. Epub 2020 Mar 31.

Event prediction from news text using subgraph embedding and graph sequence mining.

World Wide Web. 2022;25(6):2403-2428. doi: 10.1007/s11280-021-01002-1. Epub 2022 Feb 28.

Top-k Self-Adaptive Contrast Sequential Pattern Mining.

IEEE Trans Cybern. 2022 Nov;52(11):11819-11833. doi: 10.1109/TCYB.2021.3082114. Epub 2022 Oct 17.

MAIL: mining sequential patterns with wildcards.

Int J Data Min Bioinform. 2013;8(1):1-23. doi: 10.1504/ijdmb.2013.054690.

PMBC: pattern mining from biological sequences with wildcard constraints.

Comput Biol Med. 2013 Jun;43(5):481-92. doi: 10.1016/j.compbiomed.2013.02.006. Epub 2013 Mar 16.

本文引用的文献

Function-function correlated multi-label protein function prediction over interaction networks.

J Comput Biol. 2013 Apr;20(4):322-43. doi: 10.1089/cmb.2012.0272.

Predicting protein function by multi-label correlated semi-supervised learning.

IEEE/ACM Trans Comput Biol Bioinform. 2012 Jul-Aug;9(4):1059-69. doi: 10.1109/TCBB.2011.156.

Predicting functions of proteins in mouse based on weighted protein-protein interaction network and protein hybrid properties.

PLoS One. 2011 Jan 19;6(1):e14556. doi: 10.1371/journal.pone.0014556.

The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored.

Nucleic Acids Res. 2011 Jan;39(Database issue):D561-8. doi: 10.1093/nar/gkq973. Epub 2010 Nov 2.

Computational approaches for protein function prediction: a combined strategy from multiple sequence alignment to molecular docking-based virtual screening.

Biochim Biophys Acta. 2010 Sep;1804(9):1695-712. doi: 10.1016/j.bbapap.2010.04.008. Epub 2010 Apr 28.

Computational approaches for detecting protein complexes from protein interaction networks: a survey.

BMC Genomics. 2010 Feb 10;11 Suppl 1(Suppl 1):S3. doi: 10.1186/1471-2164-11-S1-S3.

Predicting protein function by frequent functional association pattern mining in protein interaction networks.

IEEE Trans Inf Technol Biomed. 2010 Jan;14(1):30-6. doi: 10.1109/TITB.2009.2028234. Epub 2009 Sep 1.

Bioinformatics: microarray data clustering and functional classification.

Methods Mol Biol. 2007;382:405-16. doi: 10.1007/978-1-59745-304-2_25.

Molecular principles of the interactions of disordered proteins.

J Mol Biol. 2007 Sep 14;372(2):549-61. doi: 10.1016/j.jmb.2007.07.004. Epub 2007 Jul 12.

Diffusion kernel-based logistic regression models for protein function prediction.

OMICS. 2006 Spring;10(1):40-55. doi: 10.1089/omi.2006.10.40.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

给定序列频繁模式挖掘的间隙约束在蛋白质功能预测中的应用。

Application of gap-constraints given sequential frequent pattern mining for protein function prediction.

作者信息

机构信息

出版信息

OBJECTIVES

METHODS

RESULTS

CONCLUSION

目标

方法

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献