Ru Xiaoqing, Cao Peigang, Li Lihong, Zou Quan
Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China.
Department of Cardiology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China.
Mol Ther Nucleic Acids. 2019 Dec 6;18:16-23. doi: 10.1016/j.omtn.2019.07.019. Epub 2019 Aug 5.
Among the large number of known microRNAs (miRNAs), some miRNAs play negligible roles in cell regulation. Therefore, selecting essential miRNAs is an important initial step for a deeper understanding of miRNAs and their functions. In this study, we generated 60 classification models by combining 12 representative feature extraction methods and 5 commonly used classification algorithms. The optimal model for essential miRNA classification that we obtained is based on the Mismatch feature extraction method combined with the random forest algorithm. The F-Measure, area under the curve, and accuracy values of this model were 93.2%, 96.7%, and 93.0%, respectively. We also found that the distribution of the positive and negative examples of the first few features greatly influenced the classification results. The feature extraction methods performed best when the differences between the positive and negative examples were obvious, and this led to better classification of essential miRNAs. Because each classifier's predictions for the same sample may be different, we employed a novel voting method to improve the accuracy of the classification of essential miRNAs. The performance results showed that the best classification results were obtained when five classification models were used in the voting. The five classification models were constructed based on the Mismatch, pseudo-distance structure status pair composition, Subsequence, Kmer, and Triplet feature extraction methods. The voting result was 95.3%. Our results suggest that the voting method can be an important tool for selecting essential miRNAs.
在大量已知的微小RNA(miRNA)中,一些miRNA在细胞调控中发挥的作用微不足道。因此,选择关键的miRNA是更深入了解miRNA及其功能的重要第一步。在本研究中,我们通过结合12种代表性特征提取方法和5种常用分类算法生成了60个分类模型。我们获得的用于关键miRNA分类的最优模型基于错配特征提取方法与随机森林算法相结合。该模型的F值、曲线下面积和准确率分别为93.2%、96.7%和93.0%。我们还发现前几个特征的正负样本分布对分类结果有很大影响。当正负样本之间的差异明显时,特征提取方法表现最佳,这使得关键miRNA的分类效果更好。由于每个分类器对同一样本的预测可能不同,我们采用了一种新颖的投票方法来提高关键miRNA分类的准确性。性能结果表明,当在投票中使用五个分类模型时获得了最佳分类结果。这五个分类模型是基于错配、伪距离结构状态对组成、子序列、Kmer和三联体特征提取方法构建的。投票结果为95.3%。我们的结果表明,投票方法可以成为选择关键miRNA的重要工具。