Suppr超能文献

RFMirTarget:基于随机森林分类器的人类 microRNA 靶基因预测。

RFMirTarget: predicting human microRNA target genes with a random forest classifier.

机构信息

Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil.

出版信息

PLoS One. 2013 Jul 26;8(7):e70153. doi: 10.1371/journal.pone.0070153. Print 2013.

Abstract

MicroRNAs are key regulators of eukaryotic gene expression whose fundamental role has already been identified in many cell pathways. The correct identification of miRNAs targets is still a major challenge in bioinformatics and has motivated the development of several computational methods to overcome inherent limitations of experimental analysis. Indeed, the best results reported so far in terms of specificity and sensitivity are associated to machine learning-based methods for microRNA-target prediction. Following this trend, in the current paper we discuss and explore a microRNA-target prediction method based on a random forest classifier, namely RFMirTarget. Despite its well-known robustness regarding general classifying tasks, to the best of our knowledge, random forest have not been deeply explored for the specific context of predicting microRNAs targets. Our framework first analyzes alignments between candidate microRNA-target pairs and extracts a set of structural, thermodynamics, alignment, seed and position-based features, upon which classification is performed. Experiments have shown that RFMirTarget outperforms several well-known classifiers with statistical significance, and that its performance is not impaired by the class imbalance problem or features correlation. Moreover, comparing it against other algorithms for microRNA target prediction using independent test data sets from TarBase and starBase, we observe a very promising performance, with higher sensitivity in relation to other methods. Finally, tests performed with RFMirTarget show the benefits of feature selection even for a classifier with embedded feature importance analysis, and the consistency between relevant features identified and important biological properties for effective microRNA-target gene alignment.

摘要

MicroRNAs 是真核生物基因表达的关键调节因子,其基本作用已在许多细胞途径中得到确定。正确识别 miRNAs 的靶标仍然是生物信息学中的一个主要挑战,并促使开发了几种计算方法来克服实验分析的固有局限性。事实上,到目前为止,在特异性和灵敏度方面报告的最佳结果与基于机器学习的 microRNA 靶标预测方法相关联。基于这一趋势,在当前的论文中,我们讨论并探索了一种基于随机森林分类器的 microRNA 靶标预测方法,即 RFMirTarget。尽管它在一般分类任务中具有众所周知的稳健性,但据我们所知,随机森林在预测 microRNAs 靶标的具体情况下尚未得到深入探索。我们的框架首先分析候选 microRNA-靶对之间的比对,并提取一组基于结构、热力学、比对、种子和位置的特征,然后进行分类。实验表明,RFMirTarget 具有统计学意义上优于几种知名分类器的性能,并且其性能不受类不平衡问题或特征相关性的影响。此外,将其与 TarBase 和 starBase 中的独立测试数据集的其他 microRNA 靶标预测算法进行比较,我们观察到非常有前途的性能,与其他方法相比,敏感性更高。最后,使用 RFMirTarget 进行的测试表明,即使对于具有嵌入式特征重要性分析的分类器,特征选择也有好处,并且所识别的相关特征与有效 microRNA-靶基因比对的重要生物学特性之间具有一致性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65a8/3724815/2543cd3b7abd/pone.0070153.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验