寻找DNA序列中的最优简并模式。

Finding optimal degenerate patterns in DNA sequences.

作者信息

Shinozaki Daisuke, Akutsu Tatsuya, Maruyama Osamu

机构信息

Graduate School of Mathematics, Kyushu University, Fukuoka, Japan.

出版信息

Bioinformatics. 2003 Oct;19 Suppl 2:ii206-14. doi: 10.1093/bioinformatics/btg1079.

DOI:10.1093/bioinformatics/btg1079

PMID:14534191

Abstract

MOTIVATION

The problem of finding transcription factor binding sites in the upstream regions of given genes is algorithmically an interesting and challenging problem in computational biology. A degenerate pattern over a finite alphabet Sigma is a sequence of subsets of Sigma. A string over IUPAC nucleic acid codes is also a degenerate pattern over Sigma = {A, C, G, T}, and is used as one of the major patterns modeling transcription factor binding sites in the upstream regions of genes. However, it is known that the problem of finding a degenerate pattern consistent with both positive and negative string sets is in general NP-complete. Our aim is to devise a heuristic algorithm to find a degenerate pattern which is optimal for positive and negative string sets w.r.t. a given score function.

RESULTS

We have proposed an enumerative algorithm called SUPERPOSITION for finding optimal degenerate patterns with a pruning technique, which works with most all reasonable score functions. The performance score of the algorithm has been compared with those of other popular motif-finding algorithms YMF, MEME and AlignACE on various sets of co-regulated genes of yeast. In the computational experiment, SUPERPOSITION has outperformed the others on several gene sets.

AVAILABILITY

The python script SUPERPOSITION is available at http://www.math.kyushu-u.ac.jp/~om/softwares.html

摘要

动机

在给定基因的上游区域中寻找转录因子结合位点的问题，在计算生物学领域，从算法角度来看是一个有趣且具有挑战性的问题。在有限字母表Σ上的一个简并模式是Σ的子集序列。由国际纯粹与应用化学联合会（IUPAC）核酸编码组成的字符串也是Σ = {A, C, G, T}上的一个简并模式，并且被用作对基因上游区域中转录因子结合位点进行建模的主要模式之一。然而，已知寻找与正、负字符串集都一致的简并模式的问题通常是NP完全问题。我们的目标是设计一种启发式算法，以找到相对于给定评分函数而言对正、负字符串集最优的简并模式。

结果

我们提出了一种名为SUPERPOSITION的枚举算法，用于通过剪枝技术找到最优简并模式，该算法适用于几乎所有合理的评分函数。已将该算法的性能得分与其他流行的基序查找算法YMF、MEME和AlignACE在酵母的各种共调控基因集上的性能得分进行了比较。在计算实验中，SUPERPOSITION在几个基因集上的表现优于其他算法。

可用性

可在http://www.math.kyushu-u.ac.jp/~om/softwares.html获取Python脚本SUPERPOSITION

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

寻找DNA序列中的最优简并模式。

Finding optimal degenerate patterns in DNA sequences.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

寻找DNA序列中的最优简并模式。

Finding optimal degenerate patterns in DNA sequences.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献