MiRenSVM:使用具有多环特征的集成 SVM 分类器,更好地预测 microRNA 前体。

MiRenSVM: towards better prediction of microRNA precursors using an ensemble SVM classifier with multi-loop features.

机构信息

Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200433, China.

出版信息

BMC Bioinformatics. 2010 Dec 14;11 Suppl 11(Suppl 11):S11. doi: 10.1186/1471-2105-11-S11-S11.

Abstract

BACKGROUND

MicroRNAs (simply miRNAs) are derived from larger hairpin RNA precursors and play essential regular roles in both animals and plants. A number of computational methods for miRNA genes finding have been proposed in the past decade, yet the problem is far from being tackled, especially when considering the imbalance issue of known miRNAs and unidentified miRNAs, and the pre-miRNAs with multi-loops or higher minimum free energy (MFE). This paper presents a new computational approach, miRenSVM, for finding miRNA genes. Aiming at better prediction performance, an ensemble support vector machine (SVM) classifier is established to deal with the imbalance issue, and multi-loop features are included for identifying those pre-miRNAs with multi-loops.

RESULTS

We collected a representative dataset, which contains 697 real miRNA precursors identified by experimental procedure and other computational methods, and 5428 pseudo ones from several datasets. Experiments showed that our miRenSVM achieved a 96.5% specificity and a 93.05% sensitivity on the dataset. Compared with the state-of-the-art approaches, miRenSVM obtained better prediction results. We also applied our method to predict 14 Homo sapiens pre-miRNAs and 13 Anopheles gambiae pre-miRNAs that first appeared in miRBase13.0, MiRenSVM got a 100% prediction rate. Furthermore, performance evaluation was conducted over 27 additional species in miRBase13.0, and 92.84% (4863/5238) animal pre-miRNAs were correctly identified by miRenSVM.

CONCLUSION

MiRenSVM is an ensemble support vector machine (SVM) classification system for better detecting miRNA genes, especially those with multi-loop secondary structure.

摘要

背景

MicroRNAs(简称 miRNAs)来源于较大的发夹 RNA 前体,在动物和植物中发挥着重要的调节作用。在过去的十年中,已经提出了许多用于 miRNA 基因发现的计算方法,但这个问题远未得到解决,特别是在考虑已知 miRNA 和未识别 miRNA 之间的不平衡问题,以及具有多环或更高最小自由能 (MFE) 的 pre-miRNAs 时。本文提出了一种新的计算方法 miRenSVM,用于寻找 miRNA 基因。为了获得更好的预测性能,建立了一个集成支持向量机 (SVM) 分类器来处理不平衡问题,并包含多环特征来识别那些具有多环的 pre-miRNAs。

结果

我们收集了一个有代表性的数据集,其中包含 697 个通过实验程序和其他计算方法确定的真实 miRNA 前体,以及来自几个数据集的 5428 个伪 miRNA 前体。实验表明,我们的 miRenSVM 在数据集上实现了 96.5%的特异性和 93.05%的敏感性。与最先进的方法相比,miRenSVM 获得了更好的预测结果。我们还应用我们的方法预测了首次出现在 miRBase13.0 中的 14 个人类 pre-miRNAs 和 13 个疟原虫 pre-miRNAs,miRenSVM 得到了 100%的预测率。此外,在 miRBase13.0 中对另外 27 个物种进行了性能评估,miRenSVM 正确识别了 92.84%(4863/5238)的动物 pre-miRNAs。

结论

miRenSVM 是一种集成支持向量机 (SVM) 分类系统,用于更好地检测 miRNA 基因,特别是那些具有多环二级结构的 miRNA 基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3234/3024864/48bc56fd3c15/1471-2105-11-S11-S11-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索