ElGokhy Sherin M, ElHefnawi Mahmoud, Shoukry Amin
Department of Computer Science and Engineering, Egypt-Japan University of Science and Technology (E-JUST), 21934, New Borg El-Arab, Alexandria, Egypt.
BMC Res Notes. 2014 May 6;7:286. doi: 10.1186/1756-0500-7-286.
MicroRNAs (miRNAs) are endogenous ∼22 nt RNAs that are identified in many species as powerful regulators of gene expressions. Experimental identification of miRNAs is still slow since miRNAs are difficult to isolate by cloning due to their low expression, low stability, tissue specificity and the high cost of the cloning procedure. Thus, computational identification of miRNAs from genomic sequences provide a valuable complement to cloning. Different approaches for identification of miRNAs have been proposed based on homology, thermodynamic parameters, and cross-species comparisons.
The present paper focuses on the integration of miRNA classifiers in a meta-classifier and the identification of miRNAs from metagenomic sequences collected from different environments. An ensemble of classifiers is proposed for miRNA hairpin prediction based on four well-known classifiers (Triplet SVM, Mipred, Virgo and EumiR), with non-identical features, and which have been trained on different data. Their decisions are combined using a single hidden layer neural network to increase the accuracy of the predictions. Our ensemble classifier achieved 89.3% accuracy, 82.2% f-measure, 74% sensitivity, 97% specificity, 92.5% precision and 88.2% negative predictive value when tested on real miRNA and pseudo sequence data. The area under the receiver operating characteristic curve of our classifier is 0.9 which represents a high performance index.The proposed classifier yields a significant performance improvement relative to Triplet-SVM, Virgo and EumiR and a minor refinement over MiPred.The developed ensemble classifier is used for miRNA prediction in mine drainage, groundwater and marine metagenomic sequences downloaded from the NCBI sequence reed archive. By consulting the miRBase repository, 179 miRNAs have been identified as highly probable miRNAs. Our new approach could thus be used for mining metagenomic sequences and finding new and homologous miRNAs.
The paper investigates a computational tool for miRNA prediction in genomic or metagenomic data. It has been applied on three metagenomic samples from different environments (mine drainage, groundwater and marine metagenomic sequences). The prediction results provide a set of extremely potential miRNA hairpins for cloning prediction methods. Among the ensemble prediction obtained results there are pre-miRNA candidates that have been validated using miRbase while they have not been recognized by some of the base classifiers.
微小RNA(miRNA)是一类内源性的约22个核苷酸的RNA,在许多物种中被鉴定为基因表达的强大调节因子。由于miRNA表达量低、稳定性差、具有组织特异性以及克隆过程成本高,通过克隆实验鉴定miRNA的速度仍然较慢。因此,从基因组序列中通过计算方法鉴定miRNA为克隆方法提供了有价值的补充。基于同源性、热力学参数和跨物种比较,已经提出了不同的miRNA鉴定方法。
本文重点研究了将miRNA分类器集成到一个元分类器中,并从不同环境中收集的宏基因组序列中鉴定miRNA。基于四个著名的分类器(Triplet SVM、Mipred、Virgo和EumiR)提出了一个分类器集成,这些分类器具有不同的特征,并且在不同的数据上进行了训练。使用单隐藏层神经网络组合它们的决策,以提高预测的准确性。当在真实的miRNA和伪序列数据上进行测试时,我们的集成分类器的准确率达到了89.3%,F值为82.2%,灵敏度为74%,特异性为97%,精确率为92.5%,阴性预测值为88.2%。我们分类器的受试者工作特征曲线下面积为0.9,代表了一个高性能指标。相对于Triplet-SVM、Virgo和EumiR,所提出的分类器在性能上有显著提高,相对于Mipred有微小改进。所开发的集成分类器用于对从NCBI序列读取存档中下载的矿井排水、地下水和海洋宏基因组序列中的miRNA进行预测。通过查阅miRBase数据库,已鉴定出179个miRNA极有可能是miRNA。因此,我们的新方法可用于挖掘宏基因组序列并发现新的和同源的miRNA。
本文研究了一种用于在基因组或宏基因组数据中预测miRNA的计算工具。它已应用于来自不同环境的三个宏基因组样本(矿井排水、地下水和海洋宏基因组序列)。预测结果为克隆预测方法提供了一组极具潜力的miRNA发夹结构。在集成预测获得的结果中,有一些前体miRNA候选物已通过miRbase验证,但未被一些基础分类器识别。