College of Biosciences and Biotechnology, Institute of Tropical Plant Sciences, National Cheng Kung University, Tainan 70101, Taiwan.
Institute of Plant and Microbial Biology, Academia Sinica, NanKang, Taipei 115, Taiwan.
Bioinformatics. 2018 Apr 1;34(7):1108-1115. doi: 10.1093/bioinformatics/btx725.
MicroRNAs (miRNAs) are endogenous non-coding small RNAs (of about 22 nucleotides), which play an important role in the post-transcriptional regulation of gene expression via either mRNA cleavage or translation inhibition. Several machine learning-based approaches have been developed to identify novel miRNAs from next generation sequencing (NGS) data. Typically, precursor/genomic sequences are required as references for most methods. However, the non-availability of genomic sequences is often a limitation in miRNA discovery in non-model plants. A systematic approach to determine novel miRNAs without reference sequences is thus necessary.
In this study, an effective method was developed to identify miRNAs from non-model plants based only on NGS datasets. The miRNA prediction model was trained with several duplex structure-related features of mature miRNAs and their passenger strands using a support vector machine algorithm. The accuracy of the independent test reached 96.61% and 93.04% for dicots (Arabidopsis) and monocots (rice), respectively. Furthermore, true small RNA sequencing data from orchids was tested in this study. Twenty-one predicted orchid miRNAs were selected and experimentally validated. Significantly, 18 of them were confirmed in the qRT-PCR experiment. This novel approach was also compiled as a user-friendly program called microRPM (miRNA Prediction Model).
This resource is freely available at http://microRPM.itps.ncku.edu.tw.
nslin@sinica.edu.tw or sarah321@mail.ncku.edu.tw.
Supplementary data are available at Bioinformatics online.
微小 RNA(miRNAs)是内源性非编码小分子 RNA(约 22 个核苷酸),通过 mRNA 切割或翻译抑制,在基因表达的转录后调控中发挥重要作用。已经开发了几种基于机器学习的方法来从下一代测序(NGS)数据中识别新的 miRNAs。通常,大多数方法都需要前体/基因组序列作为参考。然而,在非模式植物中,miRNA 发现的基因组序列不可用通常是一个限制。因此,有必要开发一种无需参考序列即可确定新 miRNA 的系统方法。
在本研究中,开发了一种仅基于 NGS 数据集从非模式植物中识别 miRNAs 的有效方法。使用支持向量机算法,使用成熟 miRNA 及其过客链的几个双链结构相关特征来训练 miRNA 预测模型。二倍体(拟南芥)和单子叶植物(水稻)的独立测试准确率分别达到 96.61%和 93.04%。此外,本研究还测试了兰花的真实小 RNA 测序数据。选择了 21 个预测的兰花 miRNAs 并进行了实验验证。重要的是,其中 18 个在 qRT-PCR 实验中得到了证实。这种新方法也被编译成一个名为 microRPM(miRNA Prediction Model)的用户友好程序。
该资源可在 http://microRPM.itps.ncku.edu.tw 上免费获得。
nslin@sinica.edu.tw 或 sarah321@mail.ncku.edu.tw。
补充数据可在 Bioinformatics 在线获得。