一种用于序列数据的通用基序发现算法。

A generic motif discovery algorithm for sequential data.

作者信息

Jensen Kyle L, Styczynski Mark P, Rigoutsos Isidore, Stephanopoulos Gregory N

机构信息

Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

出版信息

Bioinformatics. 2006 Jan 1;22(1):21-8. doi: 10.1093/bioinformatics/bti745. Epub 2005 Oct 27.

DOI:10.1093/bioinformatics/bti745

PMID:16257985

Abstract

MOTIVATION

Motif discovery in sequential data is a problem of great interest and with many applications. However, previous methods have been unable to combine exhaustive search with complex motif representations and are each typically only applicable to a certain class of problems.

RESULTS

Here we present a generic motif discovery algorithm (Gemoda) for sequential data. Gemoda can be applied to any dataset with a sequential character, including both categorical and real-valued data. As we show, Gemoda deterministically discovers motifs that are maximal in composition and length. As well, the algorithm allows any choice of similarity metric for finding motifs. Finally, Gemoda's output motifs are representation-agnostic: they can be represented using regular expressions, position weight matrices or any number of other models for any type of sequential data. We demonstrate a number of applications of the algorithm, including the discovery of motifs in amino acids sequences, a new solution to the (l,d)-motif problem in DNA sequences and the discovery of conserved protein substructures.

AVAILABILITY

Gemoda is freely available at http://web.mit.edu/bamel/gemoda

摘要

动机

在序列数据中发现基序是一个备受关注且有许多应用的问题。然而，先前的方法无法将穷举搜索与复杂的基序表示相结合，并且通常每种方法仅适用于某一类问题。

结果

在此，我们提出了一种用于序列数据的通用基序发现算法（Gemoda）。Gemoda可应用于任何具有序列特征的数据集，包括分类数据和实值数据。如我们所示，Gemoda能确定性地发现组成和长度上最大的基序。此外，该算法允许在寻找基序时选择任何相似性度量。最后，Gemoda输出的基序与表示方式无关：它们可以使用正则表达式、位置权重矩阵或用于任何类型序列数据的许多其他模型来表示。我们展示了该算法的一些应用，包括在氨基酸序列中发现基序、DNA序列中（l,d）-基序问题的新解决方案以及保守蛋白质子结构的发现。

可用性

Gemoda可在http://web.mit.edu/bamel/gemoda免费获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

一种用于序列数据的通用基序发现算法。

A generic motif discovery algorithm for sequential data.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

一种用于序列数据的通用基序发现算法。

A generic motif discovery algorithm for sequential data.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献