Gorodkin J, Stricklin S L, Stormo G D
Department of Genetics and Ecology, The Institute of Biological Sciences, University of Aarhus, Building 540, Ny Munkegade, DK-8000 Aarhus C, Denmark.
Nucleic Acids Res. 2001 May 15;29(10):2135-44. doi: 10.1093/nar/29.10.2135.
Post-transcriptional regulation of gene expression is often accomplished by proteins binding to specific sequence motifs in mRNA molecules, to affect their translation or stability. The motifs are often composed of a combination of sequence and structural constraints such that the overall structure is preserved even though much of the primary sequence is variable. While several methods exist to discover transcriptional regulatory sites in the DNA sequences of coregulated genes, the RNA motif discovery problem is much more difficult because of covariation in the positions. We describe the combined use of two approaches for RNA structure prediction, FOLDALIGN and COVE, that together can discover and model stem-loop RNA motifs in unaligned sequences, such as UTRs from post-transcriptionally coregulated genes. We evaluate the method on two datasets, one a section of rRNA genes with randomly truncated ends so that a global alignment is not possible, and the other a hyper-variable collection of IRE-like elements that were inserted into randomized UTR sequences. In both cases the combined method identified the motifs correctly, and in the rRNA example we show that it is capable of determining the structure, which includes bulge and internal loops as well as a variable length hairpin loop. Those automated results are quantitatively evaluated and found to agree closely with structures contained in curated databases, with correlation coefficients up to 0.9. A basic server, Stem-Loop Align SearcH (SLASH), which will perform stem-loop searches in unaligned RNA sequences, is available at http://www.bioinf.au.dk/slash/.
基因表达的转录后调控通常是通过蛋白质与mRNA分子中的特定序列基序结合来实现的,从而影响其翻译或稳定性。这些基序通常由序列和结构限制的组合构成,这样即使大部分一级序列是可变的,整体结构仍能得以保留。虽然存在多种方法可用于发现共调控基因DNA序列中的转录调控位点,但由于位置上的共变,RNA基序发现问题要困难得多。我们描述了两种RNA结构预测方法FOLDALIGN和COVE的联合使用,它们共同能够在未比对的序列中发现并模拟茎环RNA基序,比如来自转录后共调控基因的非编码区。我们在两个数据集上评估了该方法,一个是rRNA基因的一部分,其末端随机截断,因此无法进行全局比对,另一个是插入到随机非编码区序列中的类似IRE元件的高变集合。在这两种情况下,联合方法都能正确识别基序,在rRNA的例子中,我们表明它能够确定结构,该结构包括凸起环和内环以及可变长度的发夹环。对这些自动化结果进行了定量评估,发现与经过整理的数据库中包含的结构密切相符,相关系数高达0.9。一个基本服务器,即茎环比对搜索(SLASH),可在http://www.bioinf.au.dk/slash/ 进行未比对RNA序列中的茎环搜索。