多生物序列中的频繁模式挖掘。

Frequent patterns mining in multiple biological sequences.

机构信息

College of Information Engineering, Yangzhou University, Yangzhou, Jiangsu 225009, China; National Key Lab of Novel Software Tech, Nanjing University, Nanjing 210093, China.

出版信息

Comput Biol Med. 2013 Oct;43(10):1444-52. doi: 10.1016/j.compbiomed.2013.07.009. Epub 2013 Jul 27.

DOI:10.1016/j.compbiomed.2013.07.009

PMID:24034736

Abstract

Existing algorithms for mining frequent patterns in multiple biosequences may generate multiple projected databases and short candidate patterns, which can increase computation time and memory requirement. In order to overcome such shortcomings, we propose a fast and efficient algorithm for mining frequent patterns in multiple biological sequences (MSPM). We first present the concept of a primary pattern, which can be extended to form larger patterns in the sequence. To detect frequent primary patterns, a prefix tree is constructed. Based on this prefix tree, a pattern-extending approach is also presented to mine frequent patterns without producing a large number of irrelevant candidate patterns. The experimental results show that the MSPM algorithm can achieve not only faster speed, but also higher quality results as compared with other methods.

摘要

现有的多生物序列频繁模式挖掘算法可能会生成多个投影数据库和短候选模式，这会增加计算时间和内存需求。为了克服这些缺点，我们提出了一种快速有效的多生物序列频繁模式挖掘算法（MSPM）。我们首先提出了主模式的概念，它可以在序列中扩展形成更大的模式。为了检测频繁的主模式，构建了一个前缀树。基于这个前缀树，我们还提出了一种模式扩展方法，用于挖掘频繁模式，而不会产生大量不相关的候选模式。实验结果表明，与其他方法相比，MSPM 算法不仅速度更快，而且结果质量更高。

相似文献

Frequent patterns mining in multiple biological sequences.多生物序列中的频繁模式挖掘。

Comput Biol Med. 2013 Oct;43(10):1444-52. doi: 10.1016/j.compbiomed.2013.07.009. Epub 2013 Jul 27.

Efficiently mining time-delayed gene expression patterns.高效挖掘时间延迟基因表达模式。

IEEE Trans Syst Man Cybern B Cybern. 2010 Apr;40(2):400-11. doi: 10.1109/TSMCB.2009.2025564. Epub 2009 Oct 30.

An iterative data mining approach for mining overlapping coexpression patterns in noisy gene expression data.一种用于在嘈杂基因表达数据中挖掘重叠共表达模式的迭代数据挖掘方法。

IEEE Trans Nanobioscience. 2009 Sep;8(3):252-8. doi: 10.1109/TNB.2009.2026747. Epub 2009 Jul 14.

A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences.一种基于所有公共子序列挖掘保序子矩阵的新方法。

Comput Math Methods Med. 2015;2015:680434. doi: 10.1155/2015/680434. Epub 2015 May 28.

Gene association analysis: a survey of frequent pattern mining from gene expression data.基因关联分析：从基因表达数据中挖掘频繁模式的调查。

Brief Bioinform. 2010 Mar;11(2):210-24. doi: 10.1093/bib/bbp042. Epub 2009 Oct 8.

Mining coherent dense subgraphs across massive biological networks for functional discovery.在海量生物网络中挖掘连贯密集子图以进行功能发现。

Bioinformatics. 2005 Jun;21 Suppl 1:i213-21. doi: 10.1093/bioinformatics/bti1049.

Discovering metric temporal constraint networks on temporal databases.发现时态数据库上的度量时态约束网络。

Artif Intell Med. 2013 Jul;58(3):139-54. doi: 10.1016/j.artmed.2013.03.006. Epub 2013 May 6.

MpBsmi: A new algorithm for the recognition of continuous biological sequence pattern based on index structure.MpBsmi：一种基于索引结构的连续生物序列模式识别新算法。

PLoS One. 2018 Apr 23;13(4):e0195601. doi: 10.1371/journal.pone.0195601. eCollection 2018.

Incremental fuzzy mining of gene expression data for gene function prediction.基于基因表达数据的渐进式模糊挖掘的基因功能预测。

IEEE Trans Biomed Eng. 2011 May;58(5):1246-52. doi: 10.1109/TBME.2010.2047724. Epub 2010 Apr 15.

Constraint-based knowledge discovery from SAGE data.基于约束的SAGE数据知识发现

In Silico Biol. 2008;8(2):157-75.

引用本文的文献

MpBsmi: A new algorithm for the recognition of continuous biological sequence pattern based on index structure.MpBsmi：一种基于索引结构的连续生物序列模式识别新算法。

PLoS One. 2018 Apr 23;13(4):e0195601. doi: 10.1371/journal.pone.0195601. eCollection 2018.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

多生物序列中的频繁模式挖掘。

Frequent patterns mining in multiple biological sequences.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献