Computer Science Department, University of Crete, Heraklion, Greece.
PLoS One. 2010 Aug 6;5(8):e11843. doi: 10.1371/journal.pone.0011843.
MicroRNAs (miRNAs) are small, single stranded RNAs with a key role in post-transcriptional regulation of thousands of genes across numerous species. While several computational methods are currently available for identifying miRNA genes, accurate prediction of the mature miRNA remains a challenge. Existing approaches fall short in predicting the location of mature miRNAs but also in finding the functional strand(s) of miRNA precursors.
METHODOLOGY/PRINCIPAL FINDINGS: Here, we present a computational tool that incorporates a Naive Bayes classifier to identify mature miRNA candidates based on sequence and secondary structure information of their miRNA precursors. We take into account both positive (true mature miRNAs) and negative (same-size non-mature miRNA sequences) examples to optimize sensitivity as well as specificity. Our method can accurately predict the start position of experimentally verified mature miRNAs for both human and mouse, achieving a significantly larger (often double) performance accuracy compared with two existing methods. Moreover, the method exhibits a very high generalization performance on miRNAs from two other organisms. More importantly, our method provides direct evidence about the features of miRNA precursors which may determine the location of the mature miRNA. We find that the triplet of positions 7, 8 and 9 from the mature miRNA end towards the closest hairpin have the largest discriminatory power, are relatively conserved in terms of sequence composition (mostly contain a Uracil) and are located within or in very close proximity to the hairpin loop, suggesting the existence of a possible recognition site for Dicer and associated proteins.
This work describes a novel algorithm for identifying the start position of mature miRNA(s) produced by miRNA precursors. Our tool has significantly better (often double) performance than two existing approaches and provides new insights about the potential use of specific sequence/structural information as recognition signals for Dicer processing. Web Tool available at: http://mirna.imbb.forth.gr/MatureBayes.html.
微小 RNA(miRNAs)是一种小的单链 RNA,在转录后调控数千种基因方面发挥着关键作用,涉及许多物种。虽然目前有几种计算方法可用于识别 miRNA 基因,但准确预测成熟 miRNA 仍然是一个挑战。现有的方法在预测成熟 miRNA 的位置方面存在不足,同时也难以找到 miRNA 前体的功能链。
方法/主要发现:本文介绍了一种计算工具,它结合了朴素贝叶斯分类器,根据 miRNA 前体的序列和二级结构信息来识别成熟 miRNA 候选物。我们考虑了阳性(真正的成熟 miRNA)和阴性(相同大小的非成熟 miRNA 序列)的例子,以优化敏感性和特异性。我们的方法可以准确预测人类和小鼠实验验证的成熟 miRNA 的起始位置,与两种现有方法相比,性能准确性显著提高(通常是两倍)。此外,该方法在另外两种生物体的 miRNA 上表现出非常高的泛化性能。更重要的是,我们的方法提供了关于 miRNA 前体特征的直接证据,这些特征可能决定成熟 miRNA 的位置。我们发现,从成熟 miRNA 末端向最近发夹的第 7、8 和 9 个位置的三联体具有最大的判别力,在序列组成方面相对保守(主要包含尿嘧啶),并且位于发夹环内或非常接近发夹环,表明可能存在 Dicer 和相关蛋白的识别位点。
本文描述了一种用于识别 miRNA 前体产生的成熟 miRNA(s)起始位置的新算法。我们的工具比两种现有的方法性能更好(通常是两倍),并提供了关于特定序列/结构信息作为 Dicer 加工识别信号的潜在用途的新见解。Web 工具可在:http://mirna.imbb.forth.gr/MatureBayes.html 获得。