Rouchka Eric C, Hardin C Timothy
Department of Computer Engineering and Computer Science, University of Louisville, Louisville, KY 40292, USA.
BMC Bioinformatics. 2007 Aug 7;8:292. doi: 10.1186/1471-2105-8-292.
Detection of short, subtle conserved motif regions within a set of related DNA or amino acid sequences can lead to discoveries about important regulatory domains such as transcription factor and DNA binding sites as well as conserved protein domains. In order to help assess motif detection algorithms on motifs with varying properties and levels of conservation, we have developed a computational tool, rMotifGen, with the sole purpose of generating a number of random DNA or protein sequences containing short sequence motifs. Each motif consensus can be user-defined, randomly generated, or created from a position-specific scoring matrix (PSSM). Insertions and mutations within these motifs are created according to user-defined parameters and substitution matrices. The resulting sequences can be helpful in mutational simulations and in testing the limits of motif detection algorithms.
Two implementations of rMotifGen have been created, one providing a graphical user interface (GUI) for random motif construction, and the other serving as a command line interface. The second implementation has the added advantages of platform independence and being able to be called in a batch mode. rMotifGen was used to construct sample sets of sequences containing DNA motifs and amino acid motifs that were then tested against the Gibbs sampler and MEME packages.
rMotifGen provides an efficient and convenient method for creating random DNA or amino acid sequences with a variable number of motifs, where the instance of each motif can be incorporated using a position-specific scoring matrix (PSSM) or by creating an instance mutated from its corresponding consensus using an evolutionary model based on substitution matrices. rMotifGen is freely available at: http://bioinformatics.louisville.edu/brg/rMotifGen/.
在一组相关的DNA或氨基酸序列中检测短的、细微的保守基序区域,有助于发现重要的调控结构域,如转录因子和DNA结合位点以及保守的蛋白质结构域。为了帮助评估针对具有不同特性和保守水平的基序的基序检测算法,我们开发了一种计算工具rMotifGen,其唯一目的是生成一些包含短序列基序的随机DNA或蛋白质序列。每个基序共有序列可以由用户定义、随机生成或从位置特异性评分矩阵(PSSM)创建。这些基序内的插入和突变根据用户定义的参数和替换矩阵创建。所得序列有助于进行突变模拟和测试基序检测算法的极限。
已创建了rMotifGen的两种实现方式,一种提供用于随机基序构建的图形用户界面(GUI),另一种用作命令行界面。第二种实现方式具有平台独立性以及能够以批处理模式调用的额外优势。rMotifGen用于构建包含DNA基序和氨基酸基序的序列样本集,然后针对Gibbs采样器和MEME软件包进行测试。
rMotifGen提供了一种高效便捷的方法,用于创建具有可变数量基序的随机DNA或氨基酸序列,其中每个基序的实例可以使用位置特异性评分矩阵(PSSM)纳入,或者通过基于替换矩阵的进化模型从其相应共有序列创建一个突变实例来纳入。rMotifGen可在以下网址免费获取:http://bioinformatics.louisville.edu/brg/rMotifGen/ 。