Steigele Stephan, Huber Wolfgang, Stocsits Claudia, Stadler Peter F, Nieselt Kay
Wilhelm-Schickard-Institut für Informatik, ZBIT-Center for Bioinformatics Tübingen, University of Tübingen, Tübingen, Germany.
BMC Biol. 2007 Jun 18;5:25. doi: 10.1186/1741-7007-5-25.
Non-coding RNAs (ncRNAs) are an emerging focus for both computational analysis and experimental research, resulting in a growing number of novel, non-protein coding transcripts with often unknown functions. Whole genome screens in higher eukaryotes, for example, provided evidence for a surprisingly large number of ncRNAs. To supplement these searches, we performed a computational analysis of seven yeast species and searched for new ncRNAs and RNA motifs.
A comparative analysis of the genomes of seven yeast species yielded roughly 2800 genomic loci that showed the hallmarks of evolutionary conserved RNA secondary structures. A total of 74% of these regions overlapped with annotated non-coding or coding genes in yeast. Coding sequences that carry predicted structured RNA elements belong to a limited number of groups with common functions, suggesting that these RNA elements are involved in post-transcriptional regulation and/or cellular localization. About 700 conserved RNA structures were found outside annotated coding sequences and known ncRNA genes. Many of these predicted elements overlapped with UTR regions of particular classes of protein coding genes. In addition, a number of RNA elements overlapped with previously characterized antisense transcripts. Transcription of about 120 predicted elements located in promoter regions and other, previously un-annotated, intergenic regions was supported by tiling array experiments, ESTs, or SAGE data.
Our computational predictions strongly suggest that yeasts harbor a substantial pool of several hundred novel ncRNAs. In addition, we describe a large number of RNA structures in coding sequences and also within antisense transcripts that were previously characterized using tiling arrays.
非编码RNA(ncRNAs)已成为计算分析和实验研究的新兴焦点,这使得越来越多具有未知功能的新型非蛋白质编码转录本不断涌现。例如,高等真核生物的全基因组筛选为数量惊人的ncRNAs提供了证据。为了补充这些搜索,我们对七种酵母物种进行了计算分析,并寻找新的ncRNAs和RNA基序。
对七种酵母物种的基因组进行比较分析,得到了大约2800个基因组位点,这些位点显示出进化保守RNA二级结构的特征。这些区域中共有74%与酵母中注释的非编码或编码基因重叠。携带预测的结构化RNA元件的编码序列属于具有共同功能的有限数量的组,这表明这些RNA元件参与转录后调控和/或细胞定位。在注释的编码序列和已知的ncRNA基因之外发现了大约700个保守的RNA结构。这些预测元件中的许多与特定类别的蛋白质编码基因的UTR区域重叠。此外,一些RNA元件与先前表征的反义转录本重叠。位于启动子区域和其他先前未注释的基因间区域的大约120个预测元件的转录得到了平铺阵列实验、EST或SAGE数据的支持。
我们的计算预测强烈表明酵母中存在数百种新型ncRNAs的大量储备。此外,我们描述了编码序列以及反义转录本中的大量RNA结构,这些结构先前是使用平铺阵列进行表征的。