Doyle Francis, Zaleski Christopher, George Ajish D, Stenson Erin K, Ricciardi Adele, Tenenbaum Scott A
Department of Biomedical Sciences, University at Albany-SUNY, School of Public Health, Rensselaer, NY, USA.
Methods Mol Biol. 2008;419:39-52. doi: 10.1007/978-1-59745-033-1_3.
The untranslated regions (UTRs) of many mRNAs contain sequence and structural motifs that are used to regulate the stability, localization, and translatability of the mRNA. It should be possible to discover previously unidentified RNA regulatory motifs by examining many related nucleotide sequences, which are assumed to contain a common motif. This is a general practice for discovery of DNA-based sequence-based patterns, in which alignment tools are heavily exploited. However, because of the complexity of sequential and structural components of RNA-based motifs, simple-alignment tools are frequently inadequate. The consensus sequences they find frequently have the potential for significant variability at any given position and are only loosely characterized. The development of RNA-motif discovery tools that infer and integrate structural information into motif discovery is both necessary and expedient. Here, we provide a selected list of existing web-accessible algorithms for the discovery of RNA motifs, which, although not exhaustive, represents the current state of the art. To facilitate the development, evaluation, and training of new software programs that identify RNA motifs, we created the UAlbany training UTR (TUTR) database, which is a collection of validated sets of sequences containing experimentally defined regulatory motifs. Presently, eleven training sets have been generated with associated indexes and "answer sets" provided that identify where the previously characterized RNA motif [the iron responsive element (IRE), AU-rich class-2 element (ARE), selenocysteine insertion sequence (SECIS), etc.] resides in each sequence. The UAlbany TUTR collection is a shared resource that is available to researchers for software development and as a research aid.
许多mRNA的非翻译区(UTR)包含用于调节mRNA稳定性、定位和可翻译性的序列和结构基序。通过检查许多相关的核苷酸序列来发现以前未识别的RNA调节基序应该是可行的,这些序列被认为包含一个共同的基序。这是发现基于DNA的序列模式的一般做法,其中大量利用了比对工具。然而,由于基于RNA的基序的序列和结构成分的复杂性,简单的比对工具常常不够用。它们找到的共有序列在任何给定位置常常有显著变异的可能性,并且只是大致描述。开发能够将结构信息推断并整合到基序发现中的RNA基序发现工具既必要又便利。在这里,我们提供了一份现有的可通过网络访问的用于发现RNA基序的算法的精选列表,虽然并不详尽,但代表了当前的技术水平。为了促进识别RNA基序的新软件程序的开发、评估和训练,我们创建了奥尔巴尼大学训练UTR(TUTR)数据库,它是一组经过验证的序列集合,包含实验确定的调节基序。目前,已经生成了11个训练集,并提供了相关索引和“答案集”,以确定先前表征的RNA基序[铁反应元件(IRE)、富含AU的2类元件(ARE)、硒代半胱氨酸插入序列(SECIS)等]在每个序列中的位置。奥尔巴尼大学TUTR集合是一种共享资源,可供研究人员用于软件开发和作为研究辅助工具。