College of Bioinformatics Science and Technology and Bio-pharmaceutical Key Laboratory of Heilongjiang Province, Harbin Medical University, Harbin 150081, PR China.
Genomics. 2011 Aug;98(2):73-8. doi: 10.1016/j.ygeno.2011.04.011. Epub 2011 May 14.
MicroRNAs (miRNAs) are non-coding RNAs that play important roles in post-transcriptional regulation. Identification of miRNAs is crucial to understanding their biological mechanism. Recently, machine-learning approaches have been employed to predict miRNA precursors (pre-miRNAs). However, features used are divergent and consequently induce different performance. Thus, feature selection is critical for pre-miRNA prediction. We generated an optimized feature subset including 13 features using a hybrid of genetic algorithm and support vector machine (GA-SVM). Based on SVM, the classification performance of the optimized feature subset is much higher than that of the two feature sets used in microPred and miPred by five-fold cross-validation. Finally, we constructed the classifier miR-SF to predict the most recently identified human pre-miRNAs in miRBase (version 16). Compared with microPred and miPred, miR-SF achieved much higher classification performance. Accuracies were 93.97%, 86.21% and 64.66% for miR-SF, microPred and miPred, respectively. Thus, miR-SF is effective for identifying pre-miRNAs.
微小 RNA(miRNA)是在转录后调控中发挥重要作用的非编码 RNA。miRNA 的鉴定对于理解其生物学机制至关重要。最近,机器学习方法已被用于预测 miRNA 前体(pre-miRNA)。然而,所使用的特征是不同的,因此导致不同的性能。因此,特征选择对于 pre-miRNA 的预测至关重要。我们使用遗传算法和支持向量机(GA-SVM)的混合体生成了一个包含 13 个特征的优化特征子集。基于 SVM,通过五重交叉验证,优化特征子集的分类性能明显高于 microPred 和 miPred 中使用的两个特征集。最后,我们构建了分类器 miR-SF 来预测 miRBase(版本 16)中最近鉴定的人类 pre-miRNA。与 microPred 和 miPred 相比,miR-SF 实现了更高的分类性能。miR-SF 的准确率分别为 93.97%、86.21%和 64.66%。因此,miR-SF 可有效用于鉴定 pre-miRNA。