HIGEDA：一种基于层次基因集遗传学的算法，用于在生物序列中寻找微妙的模体。

HIGEDA: a hierarchical gene-set genetics based algorithm for finding subtle motifs in biological sequences.

机构信息

Department of Computer Science and Engineering, Computational Biosciences Program, University of Colorado, Denver, CO, USA.

出版信息

Bioinformatics. 2010 Feb 1;26(3):302-9. doi: 10.1093/bioinformatics/btp676. Epub 2009 Dec 8.

DOI:10.1093/bioinformatics/btp676

PMID:19996163

Abstract

MOTIVATION

Identification of motifs in biological sequences is a challenging problem because such motifs are often short, degenerate, and may contain gaps. Most algorithms that have been developed for motif-finding use the expectation-maximization (EM) algorithm iteratively. Although EM algorithms can converge quickly, they depend strongly on initialization parameters and can converge to local sub-optimal solutions. In addition, they cannot generate gapped motifs. The effectiveness of EM algorithms in motif finding can be improved by incorporating methods that choose different sets of initial parameters to enable escape from local optima, and that allow gapped alignments within motif models.

RESULTS

We have developed HIGEDA, an algorithm that uses the hierarchical gene-set genetic algorithm (HGA) with EM to initiate and search for the best parameters for the motif model. In addition, HIGEDA can identify gapped motifs using a position weight matrix and dynamic programming to generate an optimal gapped alignment of the motif model with sequences from the dataset. We show that HIGEDA outperforms MEME and other motif-finding algorithms on both DNA and protein sequences.

AVAILABILITY AND IMPLEMENTATION

Source code and test datasets are available for download at http://ouray.cudenver.edu/~tnle/, implemented in C++ and supported on Linux and MS Windows.

摘要

动机

在生物序列中识别基序是一个具有挑战性的问题，因为这些基序通常较短、简并，并且可能包含间隙。大多数为基序发现而开发的算法都使用期望最大化（EM）算法进行迭代。虽然 EM 算法可以快速收敛，但它们强烈依赖于初始化参数，并且可能会收敛到局部次优解。此外，它们不能生成带间隙的基序。通过结合选择不同初始参数集的方法，可以提高 EM 算法在基序发现中的有效性，从而能够从局部最优解中逃脱，并允许在基序模型中进行带间隙的比对。

结果

我们开发了 HIGEDA，这是一种使用具有 EM 的层次基因集遗传算法（HGA）来启动和搜索基序模型最佳参数的算法。此外，HIGEDA 可以使用位置权重矩阵和动态规划来识别带间隙的基序，从而生成基序模型与数据集序列之间的最佳带间隙比对。我们表明，HIGEDA 在 DNA 和蛋白质序列上的性能均优于 MEME 和其他基序发现算法。

可用性和实现

源代码和测试数据集可在 http://ouray.cudenver.edu/~tnle/ 下载，用 C++实现，支持 Linux 和 MS Windows。

相似文献

HIGEDA: a hierarchical gene-set genetics based algorithm for finding subtle motifs in biological sequences.

Bioinformatics. 2010 Feb 1;26(3):302-9. doi: 10.1093/bioinformatics/btp676. Epub 2009 Dec 8.

Relation between weight matrix and substitution matrix: motif search by similarity.

Bioinformatics. 2005 Apr 1;21(7):938-43. doi: 10.1093/bioinformatics/bti090. Epub 2004 Oct 28.

Mclip: motif detection based on cliques of gapped local profile-to-profile alignments.

Bioinformatics. 2007 Feb 15;23(4):502-3. doi: 10.1093/bioinformatics/btl601. Epub 2006 Nov 24.

A Monte Carlo EM algorithm for de novo motif discovery in biomolecular sequences.

IEEE/ACM Trans Comput Biol Bioinform. 2009 Jul-Sep;6(3):370-86. doi: 10.1109/TCBB.2008.103.

Memetic algorithms for de novo motif-finding in biomedical sequences.

Artif Intell Med. 2012 Sep;56(1):1-17. doi: 10.1016/j.artmed.2012.04.002. Epub 2012 May 20.

On the Monte-Carlo expectation maximization for finding motifs in DNA sequences.

IEEE J Biomed Health Inform. 2015 Mar;19(2):677-86. doi: 10.1109/JBHI.2014.2322694. Epub 2014 May 8.

Regulatory motif finding by logic regression.

Bioinformatics. 2004 Nov 1;20(16):2799-811. doi: 10.1093/bioinformatics/bth333. Epub 2004 May 27.

MotifCut: regulatory motifs finding with maximum density subgraphs.

Bioinformatics. 2006 Jul 15;22(14):e150-7. doi: 10.1093/bioinformatics/btl243.

MUSA: a parameter free algorithm for the identification of biologically significant motifs.

Bioinformatics. 2006 Dec 15;22(24):2996-3002. doi: 10.1093/bioinformatics/btl537. Epub 2006 Oct 26.

Finding subtle motifs by branching from sample strings.

Bioinformatics. 2003 Oct;19 Suppl 2:ii149-55. doi: 10.1093/bioinformatics/btg1072.

引用本文的文献

Review of Different Sequence Motif Finding Algorithms.

Avicenna J Med Biotechnol. 2019 Apr-Jun;11(2):130-148.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

HIGEDA：一种基于层次基因集遗传学的算法，用于在生物序列中寻找微妙的模体。

HIGEDA: a hierarchical gene-set genetics based algorithm for finding subtle motifs in biological sequences.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献