Le Shu-Yun, Shapiro Bruce A
Center for Cancer Research Nanobiology Program, NCI Center for Cancer Research, National Cancer Institute, Frederick, MD, USA.
Wiley Interdiscip Rev Data Min Knowl Discov. 2011 Jan-Feb;1(1):88-95. doi: 10.1002/widm.13. Epub 2011 Jan 10.
The normal functions of genomes depend on the precise expression of messenger RNAs and noncoding RNAs (ncRNAs) such as transfer RNAs and microRNAs in eukaryotes. These ncRNAs and functional RNA structures (FRSs) act as regulators or response elements for cellular factors and participate in transcription, posttranscriptional processing, and translation. Knowledge discovery of these FRSs in huge DNA/RNA sequence databases is a very important step to reach our goal of going from genomic sequence data to biological knowledge for understanding RNA-based regulation. Analyses of a large number of FRSs have indicated that the FRS can be well characterized by some quantitative measures such as significance and well-ordered scores of the local segment. Various data mining tools have been developed and successfully applied to FRS discovery in genomic sequence databases. Here, we summarize our efforts in the computational discovery of structured features of ncRNAs and FRSs within complex genomes by EDscan and SigED.
基因组的正常功能取决于真核生物中信使核糖核酸(mRNA)和非编码核糖核酸(ncRNA)(如转运核糖核酸和微小核糖核酸)的精确表达。这些ncRNA和功能性RNA结构(FRS)作为细胞因子的调节因子或反应元件,参与转录、转录后加工和翻译。在庞大的DNA/RNA序列数据库中发现这些FRS是实现我们从基因组序列数据获取生物学知识以理解基于RNA的调控这一目标的非常重要的一步。对大量FRS的分析表明,FRS可以通过一些定量指标(如局部片段的显著性和有序得分)进行很好的表征。已经开发了各种数据挖掘工具,并成功应用于基因组序列数据库中的FRS发现。在此,我们总结了我们通过EDscan和SigED在复杂基因组中计算发现ncRNA和FRS结构特征方面所做的工作。