Suppr超能文献

MUSA:一种用于识别具有生物学意义基序的无参数算法。

MUSA: a parameter free algorithm for the identification of biologically significant motifs.

作者信息

Mendes Nuno D, Casimiro Ana C, Santos Pedro M, Sá-Correia Isabel, Oliveira Arlindo L, Freitas Ana T

机构信息

INESC-ID, Instituto Superior Técnico, Rua Alves Redol 9 1000-029 Lisboa, Portugal.

出版信息

Bioinformatics. 2006 Dec 15;22(24):2996-3002. doi: 10.1093/bioinformatics/btl537. Epub 2006 Oct 26.

Abstract

MOTIVATION

The ability to identify complex motifs, i.e. non-contiguous nucleotide sequences, is a key feature of modern motif finders. Addressing this problem is extremely important, not only because these motifs can accurately model biological phenomena but because its extraction is highly dependent upon the appropriate selection of numerous search parameters. Currently available combinatorial algorithms have proved to be highly efficient in exhaustively enumerating motifs (including complex motifs), which fulfill certain extraction criteria. However, one major problem with these methods is the large number of parameters that need to be specified.

RESULTS

We propose a new algorithm, MUSA (Motif finding using an UnSupervised Approach), that can be used either to autonomously find over-represented complex motifs or to estimate search parameters for modern motif finders. This method relies on a biclustering algorithm that operates on a matrix of co-occurrences of small motifs. The performance of this method is independent of the composite structure of the motifs being sought, making few assumptions about their characteristics. The MUSA algorithm was applied to two datasets involving the bacterium Pseudomonas putida KT2440. The first one was composed of 70 sigma(54)-dependent promoter sequences and the second dataset included 54 promoter sequences of up-regulated genes in response to phenol, as suggested by quantitative proteomics. The results obtained indicate that this approach is very effective at identifying complex motifs of biological significance.

AVAILABILITY

The MUSA algorithm is available upon request from the authors, and will be made available via a Web based interface.

摘要

动机

识别复杂基序(即非连续核苷酸序列)的能力是现代基序查找工具的关键特性。解决这个问题极其重要,这不仅是因为这些基序能够精确地模拟生物学现象,还因为其提取高度依赖于众多搜索参数的恰当选择。目前可用的组合算法已被证明在详尽枚举满足特定提取标准的基序(包括复杂基序)方面非常高效。然而,这些方法的一个主要问题是需要指定大量参数。

结果

我们提出了一种新算法MUSA(使用无监督方法进行基序查找),它既可以用于自主查找过度出现的复杂基序,也可以用于估计现代基序查找工具的搜索参数。该方法依赖于一种双聚类算法,该算法作用于小基序共现矩阵。此方法的性能与所寻找基序的复合结构无关,对其特征几乎不做假设。MUSA算法应用于两个涉及恶臭假单胞菌KT2440的数据集。第一个数据集由70个依赖σ54的启动子序列组成,第二个数据集包含如定量蛋白质组学所表明的54个响应苯酚而上调基因的启动子序列。所获得的结果表明,这种方法在识别具有生物学意义的复杂基序方面非常有效。

可用性

可向作者索取MUSA算法,并将通过基于网络的界面提供。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验