Suppr超能文献

HIGEDA:一种基于层次基因集遗传学的算法,用于在生物序列中寻找微妙的模体。

HIGEDA: a hierarchical gene-set genetics based algorithm for finding subtle motifs in biological sequences.

机构信息

Department of Computer Science and Engineering, Computational Biosciences Program, University of Colorado, Denver, CO, USA.

出版信息

Bioinformatics. 2010 Feb 1;26(3):302-9. doi: 10.1093/bioinformatics/btp676. Epub 2009 Dec 8.

Abstract

MOTIVATION

Identification of motifs in biological sequences is a challenging problem because such motifs are often short, degenerate, and may contain gaps. Most algorithms that have been developed for motif-finding use the expectation-maximization (EM) algorithm iteratively. Although EM algorithms can converge quickly, they depend strongly on initialization parameters and can converge to local sub-optimal solutions. In addition, they cannot generate gapped motifs. The effectiveness of EM algorithms in motif finding can be improved by incorporating methods that choose different sets of initial parameters to enable escape from local optima, and that allow gapped alignments within motif models.

RESULTS

We have developed HIGEDA, an algorithm that uses the hierarchical gene-set genetic algorithm (HGA) with EM to initiate and search for the best parameters for the motif model. In addition, HIGEDA can identify gapped motifs using a position weight matrix and dynamic programming to generate an optimal gapped alignment of the motif model with sequences from the dataset. We show that HIGEDA outperforms MEME and other motif-finding algorithms on both DNA and protein sequences.

AVAILABILITY AND IMPLEMENTATION

Source code and test datasets are available for download at http://ouray.cudenver.edu/~tnle/, implemented in C++ and supported on Linux and MS Windows.

摘要

动机

在生物序列中识别基序是一个具有挑战性的问题,因为这些基序通常较短、简并,并且可能包含间隙。大多数为基序发现而开发的算法都使用期望最大化(EM)算法进行迭代。虽然 EM 算法可以快速收敛,但它们强烈依赖于初始化参数,并且可能会收敛到局部次优解。此外,它们不能生成带间隙的基序。通过结合选择不同初始参数集的方法,可以提高 EM 算法在基序发现中的有效性,从而能够从局部最优解中逃脱,并允许在基序模型中进行带间隙的比对。

结果

我们开发了 HIGEDA,这是一种使用具有 EM 的层次基因集遗传算法(HGA)来启动和搜索基序模型最佳参数的算法。此外,HIGEDA 可以使用位置权重矩阵和动态规划来识别带间隙的基序,从而生成基序模型与数据集序列之间的最佳带间隙比对。我们表明,HIGEDA 在 DNA 和蛋白质序列上的性能均优于 MEME 和其他基序发现算法。

可用性和实现

源代码和测试数据集可在 http://ouray.cudenver.edu/~tnle/ 下载,用 C++实现,支持 Linux 和 MS Windows。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验