Bailey T L, Elkan C
Department of Computer Science and Engineering University of California at San Diego, La Jolla 92093-0114, USA.
Proc Int Conf Intell Syst Mol Biol. 1995;3:21-9.
MEME is a tool for discovering motifs in sets of protein or DNA sequences. This paper describes several extensions to MEME which increase its ability to find motifs in a totally unsupervised fashion, but which also allow it to benefit when prior knowledge is available. When no background knowledge is asserted. MEME obtains increased robustness from a method for determining motif widths automatically, and from probabilistic models that allow motifs to be absent in some input sequences. On the other hand, MEME can exploit prior knowledge about a motif being present in all input sequences, about the length of a motif and whether it is a palindrome, and (using Dirichlet mixtures) about expected patterns in individual motif positions. Extensive experiments are reported which support the claim that MEME benefits from, but does not require, background knowledge. The experiments use seven previously studied DNA and protein sequence families and 75 of the protein families documented in the Prosite database of sites and patterns, Release 11.1.
MEME是一种用于在蛋白质或DNA序列集中发现基序的工具。本文描述了对MEME的几种扩展,这些扩展提高了它以完全无监督方式发现基序的能力,但在有先验知识可用时也能使其受益。当没有断言背景知识时,MEME通过一种自动确定基序宽度的方法以及允许基序在某些输入序列中不存在的概率模型获得更高的鲁棒性。另一方面,MEME可以利用关于所有输入序列中存在的基序、基序长度以及它是否是回文的先验知识,以及(使用狄利克雷混合)关于各个基序位置的预期模式的先验知识。报告了大量实验,这些实验支持了MEME受益于但不依赖背景知识这一说法。实验使用了七个先前研究过的DNA和蛋白质序列家族以及Prosite位点和模式数据库(版本11.1)中记录的75个蛋白质家族。