State Key Laboratory of Biocontrol, Key Laboratory of Gene Engineering of the Ministry of Education, Sun Yat-Sen University, Guangzhou, China.
RNA Biol. 2011 Sep-Oct;8(5):922-34. doi: 10.4161/rna.8.5.16026. Epub 2011 Sep 1.
microRNAs (miRNAs) represent an abundant group of small regulatory non-coding RNAs in eukaryotes. The emergence of Next-generation sequencing (NGS) technologies has allowed the systematic detection of small RNAs (sRNAs) and de novo sequencing of genomes quickly and with low cost. As a result, there is an increased need to develop fast miRNA prediction tools to annotate miRNAs from various organisms with a high level of accuracy, using the genome sequence or the NGS data. Several miRNA predictors have been proposed to achieve this purpose. However, the accuracy and fitness for multiple species of existing predictors needed to be improved. Here, we present a novel prediction tool called mirExplorer, which is based on an integrated adaptive boosting method and contains two modules. The first module named mirExplorer-genome was designed to de novo predict pre-miRNAs from genome, and the second module named mirExplorer-NGS was used to discover miRNAs from NGS data. A set of novel features of pre-miRNA secondary structure and miRNA biogenesis has been extracted to distinguish real pre-miRNAs from pseudo ones. We used outer-ten-fold cross-validation to verify the mirExplorer-genome computation, which obtained a specificity of 95.03% and a sensitivity of 93.71% on human data. This computation was made on test data from 16 species, and it achieved an overall accuracy of 95.53%. Systematic outer-ten-fold cross-validation of the mirExplorer-NGS model achieved a specificity of 98.3% and a sensitivity of 97.72%. We found that the good performance of the mirExplorer-NGS model was upheld across species from vertebrates to plants in test datasets. The mirExplorer is available as both web server and software package at http://biocenter.sysu.edu.cn/mir/.
microRNAs (miRNAs) 是真核生物中一类丰富的小调控非编码 RNA。下一代测序 (NGS) 技术的出现使得快速、低成本地系统检测小分子 RNA (sRNA) 和从头测序基因组成为可能。因此,人们越来越需要开发快速的 miRNA 预测工具,以便使用基因组序列或 NGS 数据,以高精度注释来自各种生物体的 miRNA。已经提出了几种 miRNA 预测器来实现这一目的。然而,现有的预测器在准确性和对多种物种的适应性方面都需要改进。在这里,我们提出了一种新的预测工具,称为 mirExplorer,它基于集成自适应提升方法,并包含两个模块。第一个模块名为 mirExplorer-genome,用于从头预测基因组中的 pre-miRNA,第二个模块名为 mirExplorer-NGS,用于从 NGS 数据中发现 miRNA。提取了一组新的 pre-miRNA 二级结构和 miRNA 生物发生特征,用于区分真实的 pre-miRNA 和伪 pre-miRNA。我们使用十外交叉验证来验证 mirExplorer-genome 的计算,在人类数据上,特异性为 95.03%,敏感性为 93.71%。该计算是在来自 16 个物种的测试数据上进行的,总体准确率为 95.53%。对 mirExplorer-NGS 模型进行系统的十外交叉验证,特异性为 98.3%,敏感性为 97.72%。我们发现,mirExplorer-NGS 模型在测试数据集的脊椎动物到植物等物种中的性能良好。mirExplorer 可在 http://biocenter.sysu.edu.cn/mir/ 作为网络服务器和软件包使用。