School of Computer Science and McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada H3A2B2 and Laboratoire de bioinformatique du département informatique, Université du Québec À Montréal, Montreal, Quebec, Canada H2X3Y7.
Nucleic Acids Res. 2013 Aug;41(15):7200-11. doi: 10.1093/nar/gkt466. Epub 2013 Jun 8.
MicroRNAs (miRNAs) are short RNA species derived from hairpin-forming miRNA precursors (pre-miRNA) and acting as key posttranscriptional regulators. Most computational tools labeled as miRNA predictors are in fact pre-miRNA predictors and provide no information about the putative miRNA location within the pre-miRNA. Sequence and structural features that determine the location of the miRNA, and the extent to which these properties vary from species to species, are poorly understood. We have developed miRdup, a computational predictor for the identification of the most likely miRNA location within a given pre-miRNA or the validation of a candidate miRNA. MiRdup is based on a random forest classifier trained with experimentally validated miRNAs from miRbase, with features that characterize the miRNA-miRNA* duplex. Because we observed that miRNAs have sequence and structural properties that differ between species, mostly in terms of duplex stability, we trained various clade-specific miRdup models and obtained increased accuracy. MiRdup self-trains on the most recent version of miRbase and is easy to use. Combined with existing pre-miRNA predictors, it will be valuable for both de novo mapping of miRNAs and filtering of large sets of candidate miRNAs obtained from transcriptome sequencing projects. MiRdup is open source under the GPLv3 and available at http://www.cs.mcgill.ca/∼blanchem/mirdup/.
微小 RNA(miRNAs)是源自发夹状 miRNA 前体(pre-miRNA)的短 RNA 种类,作为关键的转录后调控因子。大多数被标记为 miRNA 预测器的计算工具实际上是 pre-miRNA 预测器,无法提供有关 pre-miRNA 内假定 miRNA 位置的信息。决定 miRNA 位置的序列和结构特征,以及这些特性在物种间的变化程度,了解甚少。我们开发了 miRdup,这是一种用于识别给定 pre-miRNA 中最可能的 miRNA 位置或验证候选 miRNA 的计算预测器。MiRdup 基于使用 miRbase 中经过实验验证的 miRNAs 训练的随机森林分类器,具有表征 miRNA-miRNA*双链体的特征。因为我们观察到 miRNA 具有在物种间不同的序列和结构特性,主要表现在双链体稳定性方面,所以我们训练了各种特定进化枝的 miRdup 模型,并提高了准确性。MiRdup 会在 miRbase 的最新版本上进行自我训练,并且易于使用。与现有的 pre-miRNA 预测器结合使用,它将对从头开始映射 miRNA 和筛选来自转录组测序项目的大量候选 miRNA 都非常有价值。MiRdup 在 GPLv3 下是开源的,可在 http://www.cs.mcgill.ca/∼blanchem/mirdup/ 获得。