Suppr超能文献

基于集成的微RNA挖掘分类方法应用于多样的宏基因组序列。

Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences.

作者信息

ElGokhy Sherin M, ElHefnawi Mahmoud, Shoukry Amin

机构信息

Department of Computer Science and Engineering, Egypt-Japan University of Science and Technology (E-JUST), 21934, New Borg El-Arab, Alexandria, Egypt.

出版信息

BMC Res Notes. 2014 May 6;7:286. doi: 10.1186/1756-0500-7-286.

Abstract

BACKGROUND

MicroRNAs (miRNAs) are endogenous ∼22 nt RNAs that are identified in many species as powerful regulators of gene expressions. Experimental identification of miRNAs is still slow since miRNAs are difficult to isolate by cloning due to their low expression, low stability, tissue specificity and the high cost of the cloning procedure. Thus, computational identification of miRNAs from genomic sequences provide a valuable complement to cloning. Different approaches for identification of miRNAs have been proposed based on homology, thermodynamic parameters, and cross-species comparisons.

RESULTS

The present paper focuses on the integration of miRNA classifiers in a meta-classifier and the identification of miRNAs from metagenomic sequences collected from different environments. An ensemble of classifiers is proposed for miRNA hairpin prediction based on four well-known classifiers (Triplet SVM, Mipred, Virgo and EumiR), with non-identical features, and which have been trained on different data. Their decisions are combined using a single hidden layer neural network to increase the accuracy of the predictions. Our ensemble classifier achieved 89.3% accuracy, 82.2% f-measure, 74% sensitivity, 97% specificity, 92.5% precision and 88.2% negative predictive value when tested on real miRNA and pseudo sequence data. The area under the receiver operating characteristic curve of our classifier is 0.9 which represents a high performance index.The proposed classifier yields a significant performance improvement relative to Triplet-SVM, Virgo and EumiR and a minor refinement over MiPred.The developed ensemble classifier is used for miRNA prediction in mine drainage, groundwater and marine metagenomic sequences downloaded from the NCBI sequence reed archive. By consulting the miRBase repository, 179 miRNAs have been identified as highly probable miRNAs. Our new approach could thus be used for mining metagenomic sequences and finding new and homologous miRNAs.

CONCLUSIONS

The paper investigates a computational tool for miRNA prediction in genomic or metagenomic data. It has been applied on three metagenomic samples from different environments (mine drainage, groundwater and marine metagenomic sequences). The prediction results provide a set of extremely potential miRNA hairpins for cloning prediction methods. Among the ensemble prediction obtained results there are pre-miRNA candidates that have been validated using miRbase while they have not been recognized by some of the base classifiers.

摘要

背景

微小RNA(miRNA)是一类内源性的约22个核苷酸的RNA,在许多物种中被鉴定为基因表达的强大调节因子。由于miRNA表达量低、稳定性差、具有组织特异性以及克隆过程成本高,通过克隆实验鉴定miRNA的速度仍然较慢。因此,从基因组序列中通过计算方法鉴定miRNA为克隆方法提供了有价值的补充。基于同源性、热力学参数和跨物种比较,已经提出了不同的miRNA鉴定方法。

结果

本文重点研究了将miRNA分类器集成到一个元分类器中,并从不同环境中收集的宏基因组序列中鉴定miRNA。基于四个著名的分类器(Triplet SVM、Mipred、Virgo和EumiR)提出了一个分类器集成,这些分类器具有不同的特征,并且在不同的数据上进行了训练。使用单隐藏层神经网络组合它们的决策,以提高预测的准确性。当在真实的miRNA和伪序列数据上进行测试时,我们的集成分类器的准确率达到了89.3%,F值为82.2%,灵敏度为74%,特异性为97%,精确率为92.5%,阴性预测值为88.2%。我们分类器的受试者工作特征曲线下面积为0.9,代表了一个高性能指标。相对于Triplet-SVM、Virgo和EumiR,所提出的分类器在性能上有显著提高,相对于Mipred有微小改进。所开发的集成分类器用于对从NCBI序列读取存档中下载的矿井排水、地下水和海洋宏基因组序列中的miRNA进行预测。通过查阅miRBase数据库,已鉴定出179个miRNA极有可能是miRNA。因此,我们的新方法可用于挖掘宏基因组序列并发现新的和同源的miRNA。

结论

本文研究了一种用于在基因组或宏基因组数据中预测miRNA的计算工具。它已应用于来自不同环境的三个宏基因组样本(矿井排水、地下水和海洋宏基因组序列)。预测结果为克隆预测方法提供了一组极具潜力的miRNA发夹结构。在集成预测获得的结果中,有一些前体miRNA候选物已通过miRbase验证,但未被一些基础分类器识别。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c974/4051165/a071c22c231d/1756-0500-7-286-1.jpg

相似文献

2
In silico miRNA prediction in metazoan genomes: balancing between sensitivity and specificity.
BMC Genomics. 2009 Apr 30;10:204. doi: 10.1186/1471-2164-10-204.
3
MiRenSVM: towards better prediction of microRNA precursors using an ensemble SVM classifier with multi-loop features.
BMC Bioinformatics. 2010 Dec 14;11 Suppl 11(Suppl 11):S11. doi: 10.1186/1471-2105-11-S11-S11.
6
Improving classification of mature microRNA by solving class imbalance problem.
Sci Rep. 2016 May 16;6:25941. doi: 10.1038/srep25941.
7
A Distributed Classifier for MicroRNA Target Prediction with Validation Through TCGA Expression Data.
IEEE/ACM Trans Comput Biol Bioinform. 2018 Jul-Aug;15(4):1037-1051. doi: 10.1109/TCBB.2018.2828305. Epub 2018 Apr 19.
8
Identification of Schistosoma mansoni microRNAs.
BMC Genomics. 2011 Jan 19;12:47. doi: 10.1186/1471-2164-12-47.
9
PlantMiRNAPred: efficient classification of real and pseudo plant pre-miRNAs.
Bioinformatics. 2011 May 15;27(10):1368-76. doi: 10.1093/bioinformatics/btr153. Epub 2011 Mar 26.
10
Ab initio identification of human microRNAs based on structure motifs.
BMC Bioinformatics. 2007 Dec 18;8:478. doi: 10.1186/1471-2105-8-478.

引用本文的文献

1
Editorial: Computational modelling of cardiovascular hemodynamics and machine learning.
Front Cardiovasc Med. 2024 Feb 22;11:1355843. doi: 10.3389/fcvm.2024.1355843. eCollection 2024.
2
REGULATOR: a database of metazoan transcription factors and maternal factors for developmental studies.
BMC Bioinformatics. 2015 Apr 10;16:114. doi: 10.1186/s12859-015-0552-x.

本文引用的文献

1
Genome organization and characteristics of soybean microRNAs.
BMC Genomics. 2012 May 4;13:169. doi: 10.1186/1471-2164-13-169.
3
microPred: effective classification of pre-miRNAs for human miRNA gene prediction.
Bioinformatics. 2009 Apr 15;25(8):989-95. doi: 10.1093/bioinformatics/btp107. Epub 2009 Feb 20.
4
MicroRNA prediction with a novel ranking algorithm based on random walks.
Bioinformatics. 2008 Jul 1;24(13):i50-8. doi: 10.1093/bioinformatics/btn175.
5
Bounds on the number of hidden neurons in multilayer perceptrons.
IEEE Trans Neural Netw. 1991;2(1):47-55. doi: 10.1109/72.80290.
7
miRBase: tools for microRNA genomics.
Nucleic Acids Res. 2008 Jan;36(Database issue):D154-8. doi: 10.1093/nar/gkm952. Epub 2007 Nov 8.
8
MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features.
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W339-44. doi: 10.1093/nar/gkm368. Epub 2007 Jun 6.
9
De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures.
Bioinformatics. 2007 Jun 1;23(11):1321-30. doi: 10.1093/bioinformatics/btm026. Epub 2007 Jan 31.
10
Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data.
Bioinformatics. 2006 Jul 15;22(14):e197-202. doi: 10.1093/bioinformatics/btl257.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验