MUSA：一种用于识别具有生物学意义基序的无参数算法。

MUSA: a parameter free algorithm for the identification of biologically significant motifs.

作者信息

Mendes Nuno D, Casimiro Ana C, Santos Pedro M, Sá-Correia Isabel, Oliveira Arlindo L, Freitas Ana T

机构信息

INESC-ID, Instituto Superior Técnico, Rua Alves Redol 9 1000-029 Lisboa, Portugal.

出版信息

Bioinformatics. 2006 Dec 15;22(24):2996-3002. doi: 10.1093/bioinformatics/btl537. Epub 2006 Oct 26.

DOI:10.1093/bioinformatics/btl537

PMID:17068086

Abstract

MOTIVATION

The ability to identify complex motifs, i.e. non-contiguous nucleotide sequences, is a key feature of modern motif finders. Addressing this problem is extremely important, not only because these motifs can accurately model biological phenomena but because its extraction is highly dependent upon the appropriate selection of numerous search parameters. Currently available combinatorial algorithms have proved to be highly efficient in exhaustively enumerating motifs (including complex motifs), which fulfill certain extraction criteria. However, one major problem with these methods is the large number of parameters that need to be specified.

RESULTS

We propose a new algorithm, MUSA (Motif finding using an UnSupervised Approach), that can be used either to autonomously find over-represented complex motifs or to estimate search parameters for modern motif finders. This method relies on a biclustering algorithm that operates on a matrix of co-occurrences of small motifs. The performance of this method is independent of the composite structure of the motifs being sought, making few assumptions about their characteristics. The MUSA algorithm was applied to two datasets involving the bacterium Pseudomonas putida KT2440. The first one was composed of 70 sigma(54)-dependent promoter sequences and the second dataset included 54 promoter sequences of up-regulated genes in response to phenol, as suggested by quantitative proteomics. The results obtained indicate that this approach is very effective at identifying complex motifs of biological significance.

AVAILABILITY

The MUSA algorithm is available upon request from the authors, and will be made available via a Web based interface.

摘要

动机

识别复杂基序（即非连续核苷酸序列）的能力是现代基序查找工具的关键特性。解决这个问题极其重要，这不仅是因为这些基序能够精确地模拟生物学现象，还因为其提取高度依赖于众多搜索参数的恰当选择。目前可用的组合算法已被证明在详尽枚举满足特定提取标准的基序（包括复杂基序）方面非常高效。然而，这些方法的一个主要问题是需要指定大量参数。

结果

我们提出了一种新算法MUSA（使用无监督方法进行基序查找），它既可以用于自主查找过度出现的复杂基序，也可以用于估计现代基序查找工具的搜索参数。该方法依赖于一种双聚类算法，该算法作用于小基序共现矩阵。此方法的性能与所寻找基序的复合结构无关，对其特征几乎不做假设。MUSA算法应用于两个涉及恶臭假单胞菌KT2440的数据集。第一个数据集由70个依赖σ54的启动子序列组成，第二个数据集包含如定量蛋白质组学所表明的54个响应苯酚而上调基因的启动子序列。所获得的结果表明，这种方法在识别具有生物学意义的复杂基序方面非常有效。

可用性

可向作者索取MUSA算法，并将通过基于网络的界面提供。

相似文献

MUSA: a parameter free algorithm for the identification of biologically significant motifs.MUSA：一种用于识别具有生物学意义基序的无参数算法。

Bioinformatics. 2006 Dec 15;22(24):2996-3002. doi: 10.1093/bioinformatics/btl537. Epub 2006 Oct 26.

SPACER: identification of cis-regulatory elements with non-contiguous critical residues.间隔序列：具有非连续关键残基的顺式调控元件的鉴定

Bioinformatics. 2007 Apr 15;23(8):1029-31. doi: 10.1093/bioinformatics/btm041.

Finding motifs from all sequences with and without binding sites.从所有具有和不具有结合位点的序列中寻找基序。

Bioinformatics. 2006 Sep 15;22(18):2217-23. doi: 10.1093/bioinformatics/btl371. Epub 2006 Jul 26.

Regulatory motif finding by logic regression.通过逻辑回归进行调控基序发现。

Bioinformatics. 2004 Nov 1;20(16):2799-811. doi: 10.1093/bioinformatics/bth333. Epub 2004 May 27.

Detection of generic spaced motifs using submotif pattern mining.使用子基序模式挖掘检测通用间隔基序

Bioinformatics. 2007 Jun 15;23(12):1476-85. doi: 10.1093/bioinformatics/btm118. Epub 2007 May 5.

A generic motif discovery algorithm for sequential data.一种用于序列数据的通用基序发现算法。

Bioinformatics. 2006 Jan 1;22(1):21-8. doi: 10.1093/bioinformatics/bti745. Epub 2005 Oct 27.

A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length.一种用于识别具有对称结构、间隔的DNA基序并改进信号长度估计的吉布斯采样器。

Bioinformatics. 2005 May 15;21(10):2240-5. doi: 10.1093/bioinformatics/bti336. Epub 2005 Feb 22.

Apples to apples: improving the performance of motif finders and their significance analysis in the Twilight Zone.同类比较：提升模体发现工具在临界区域的性能及其显著性分析

Bioinformatics. 2006 Jul 15;22(14):e393-401. doi: 10.1093/bioinformatics/btl245.

Transcription factor binding site identification using the self-organizing map.使用自组织映射识别转录因子结合位点

Bioinformatics. 2005 May 1;21(9):1807-14. doi: 10.1093/bioinformatics/bti256. Epub 2005 Jan 12.

On counting position weight matrix matches in a sequence, with application to discriminative motif finding.关于计算序列中的位置权重矩阵匹配及其在判别性基序发现中的应用。

Bioinformatics. 2006 Jul 15;22(14):e454-63. doi: 10.1093/bioinformatics/btl227.

引用本文的文献

Review of Different Sequence Motif Finding Algorithms.不同序列基序查找算法综述。

Avicenna J Med Biotechnol. 2019 Apr-Jun;11(2):130-148.

Transcriptional profiling of Arabidopsis root hairs and pollen defines an apical cell growth signature.拟南芥根毛和花粉的转录谱分析确定了顶端细胞生长特征。

BMC Plant Biol. 2014 Aug 1;14:197. doi: 10.1186/s12870-014-0197-3.

Models incorporating chromatin modification data identify functionally important p53 binding sites.整合染色质修饰数据的模型可鉴定功能重要的 p53 结合位点。

Nucleic Acids Res. 2013 Jun;41(11):5582-93. doi: 10.1093/nar/gkt260. Epub 2013 Apr 17.

Direct vs 2-stage approaches to structured motif finding.用于结构化基序发现的直接方法与两阶段方法

Algorithms Mol Biol. 2012 Aug 21;7(1):20. doi: 10.1186/1748-7188-7-20.

Yeast IME2 functions early in meiosis upstream of cell cycle-regulated SBF and MBF targets.酵母 IME2 在细胞周期调控的 SBF 和 MBF 靶标上游的减数分裂早期发挥作用。

PLoS One. 2012;7(2):e31575. doi: 10.1371/journal.pone.0031575. Epub 2012 Feb 29.

Functional gene expression profiling in yeast implicates translational dysfunction in mutant huntingtin toxicity.酵母中功能性基因表达谱的研究提示突变型亨廷顿蛋白毒性与翻译功能障碍有关。

J Biol Chem. 2011 Jan 7;286(1):410-9. doi: 10.1074/jbc.M110.101527. Epub 2010 Nov 2.

A survey of DNA motif finding algorithms.DNA基序查找算法综述。

BMC Bioinformatics. 2007 Nov 1;8 Suppl 7(Suppl 7):S21. doi: 10.1186/1471-2105-8-S7-S21.

YEASTRACT-DISCOVERER: new tools to improve the analysis of transcriptional regulatory associations in Saccharomyces cerevisiae.YEASTRACT发现者：用于改进酿酒酵母转录调控关联分析的新工具。

Nucleic Acids Res. 2008 Jan;36(Database issue):D132-6. doi: 10.1093/nar/gkm976. Epub 2007 Nov 21.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

MUSA：一种用于识别具有生物学意义基序的无参数算法。

MUSA: a parameter free algorithm for the identification of biologically significant motifs.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献