Fedorov Alexei, Stombaugh Jesse, Harr Michael W, Yu Saihua, Nasalean Lorena, Shepelev Valery
Department of Medicine, Program in Bioinformatics and Proteomics/Genomics, Medical University of Ohio, Toledo, OH 43614, USA.
Nucleic Acids Res. 2005 Aug 10;33(14):4578-83. doi: 10.1093/nar/gki754. Print 2005.
Based on comparative genomics, we created a bioinformatic package for computer prediction of small nucleolar RNA (snoRNA) genes in mammalian introns. The core of our approach was the use of the Mammalian Orthologous Intron Database (MOID), which contains all known introns within the human, mouse and rat genomes. Introns from orthologous genes from these three species, that have the same position relative to the reading frame, are grouped in a special orthologous intron table. Our program SNO.pl searches for conserved snoRNA motifs within MOID and reports all cases when characteristic snoRNA-like structures are present in all three orthologous introns of human, mouse and rat sequences. Here we report an example of the SNO.pl usage for searching a particular pattern of conserved C/D-box snoRNA motifs (canonical C- and D-boxes and the 6 nt long terminal stem). In this computer analysis, we detected 57 triplets of snoRNA-like structures in three mammals. Among them were 15 triplets that represented known C/D-box snoRNA genes. Six triplets represented snoRNA genes that had only been partially characterized in the mouse genome. One case represented a novel snoRNA gene, and another three cases, putative snoRNAs. Our programs are publicly available and can be easily adapted and/or modified for searching any conserved motifs within mammalian introns.
基于比较基因组学,我们创建了一个生物信息学软件包,用于在哺乳动物内含子中对小核仁RNA(snoRNA)基因进行计算机预测。我们方法的核心是使用哺乳动物直系同源内含子数据库(MOID),该数据库包含人类、小鼠和大鼠基因组内所有已知的内含子。来自这三个物种直系同源基因的内含子,若相对于阅读框具有相同位置,则被分组在一个特殊的直系同源内含子表中。我们的程序SNO.pl在MOID中搜索保守的snoRNA基序,并报告在人类、小鼠和大鼠序列的所有三个直系同源内含子中存在特征性snoRNA样结构的所有情况。在此,我们报告一个使用SNO.pl搜索保守C/D盒snoRNA基序(典型的C盒和D盒以及6个核苷酸长的末端茎)特定模式的示例。在这次计算机分析中,我们在三种哺乳动物中检测到57个snoRNA样结构三联体。其中有15个三联体代表已知的C/D盒snoRNA基因。六个三联体代表仅在小鼠基因组中部分表征的snoRNA基因。一个案例代表一个新的snoRNA基因,另外三个案例代表推定的snoRNA。我们的程序可公开获取,并且可以很容易地进行调整和/或修改,以搜索哺乳动物内含子内的任何保守基序。