Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin, People's Republic of China.
PLoS One. 2011;6(11):e27422. doi: 10.1371/journal.pone.0027422. Epub 2011 Nov 16.
MicroRNAs (miRNAs) are a set of short (19∼24 nt) non-coding RNAs that play significant roles as posttranscriptional regulators in animals and plants. The ab initio prediction methods show excellent performance for discovering new pre-miRNAs. While most of these methods can distinguish real pre-miRNAs from pseudo pre-miRNAs, few can predict the positions of miRNAs. Among the existing methods that can also predict the miRNA positions, most of them are designed for mammalian miRNAs, including human and mouse. Minority of methods can predict the positions of plant miRNAs. Accurate prediction of the miRNA positions remains a challenge, especially for plant miRNAs. This motivates us to develop MaturePred, a machine learning method based on support vector machine, to predict the positions of plant miRNAs for the new plant pre-miRNA candidates.
METHODOLOGY/PRINCIPAL FINDINGS: A miRNA:miRNA* duplex is regarded as a whole to capture the binding characteristics of miRNAs. We extract the position-specific features, the energy related features, the structure related features, and stability related features from real/pseudo miRNA:miRNA* duplexes. A set of informative features are selected to improve the prediction accuracy. Two-stage sample selection algorithm is proposed to combat the serious imbalance problem between real and pseudo miRNA:miRNA* duplexes. The prediction method, MaturePred, can accurately predict plant miRNAs and achieve higher prediction accuracy compared with the existing methods. Further, we trained a prediction model with animal data to predict animal miRNAs. The model also achieves higher prediction performance. It further confirms the efficiency of our miRNA prediction method.
The superior performance of the proposed prediction model can be attributed to the extracted features of plant miRNAs and miRNA*s, the selected training dataset, and the carefully selected features. The web service of MaturePred, the training datasets, the testing datasets, and the selected features are freely available at http://nclab.hit.edu.cn/maturepred/.
MicroRNAs(miRNAs)是一组短的(19∼24nt)非编码 RNA,作为动植物转录后调控因子发挥重要作用。从头预测方法在发现新的前体 miRNA 方面表现出优异的性能。虽然这些方法中的大多数都可以将真实的前体 miRNA 与伪前体 miRNA 区分开来,但很少有方法可以预测 miRNA 的位置。在可以预测 miRNA 位置的现有方法中,大多数方法都是针对哺乳动物 miRNA 设计的,包括人类和小鼠。少数方法可以预测植物 miRNA 的位置。准确预测 miRNA 的位置仍然是一个挑战,尤其是对于植物 miRNA。这促使我们开发了基于支持向量机的机器学习方法 MaturePred,用于预测新植物前体 miRNA 候选物的植物 miRNA 位置。
方法/主要发现:将 miRNA:miRNA双链体视为一个整体,以捕获 miRNA 的结合特征。我们从真实/伪 miRNA:miRNA双链体中提取位置特异性特征、能量相关特征、结构相关特征和稳定性相关特征。选择一组信息丰富的特征来提高预测准确性。提出了两阶段样本选择算法来解决真实和伪 miRNA:miRNA*双链体之间严重不平衡的问题。预测方法 MaturePred 可以准确预测植物 miRNA,与现有方法相比,预测精度更高。此外,我们使用动物数据训练了一个预测模型来预测动物 miRNA。该模型也取得了更高的预测性能。这进一步证实了我们的 miRNA 预测方法的效率。
所提出的预测模型的优越性能可归因于植物 miRNAs 和 miRNA*的提取特征、选择的训练数据集以及精心选择的特征。MaturePred 的网络服务、训练数据集、测试数据集和选定的特征可在 http://nclab.hit.edu.cn/maturepred/ 上免费获得。