Machine Intelligence Unit, Indian Statistical Institute, Kolkata, West Bengal, India.
PLoS One. 2011;6(9):e24583. doi: 10.1371/journal.pone.0024583. Epub 2011 Sep 15.
Machine learning based miRNA-target prediction algorithms often fail to obtain a balanced prediction accuracy in terms of both sensitivity and specificity due to lack of the gold standard of negative examples, miRNA-targeting site context specific relevant features and efficient feature selection process. Moreover, all the sequence, structure and machine learning based algorithms are unable to distribute the true positive predictions preferentially at the top of the ranked list; hence the algorithms become unreliable to the biologists. In addition, these algorithms fail to obtain considerable combination of precision and recall for the target transcripts that are translationally repressed at protein level.
METHODOLOGY/PRINCIPAL FINDING: In the proposed article, we introduce an efficient miRNA-target prediction system MultiMiTar, a Support Vector Machine (SVM) based classifier integrated with a multiobjective metaheuristic based feature selection technique. The robust performance of the proposed method is mainly the result of using high quality negative examples and selection of biologically relevant miRNA-targeting site context specific features. The features are selected by using a novel feature selection technique AMOSA-SVM, that integrates the multi objective optimization technique Archived Multi-Objective Simulated Annealing (AMOSA) and SVM.
CONCLUSIONS/SIGNIFICANCE: MultiMiTar is found to achieve much higher Matthew's correlation coefficient (MCC) of 0.583 and average class-wise accuracy (ACA) of 0.8 compared to the others target prediction methods for a completely independent test data set. The obtained MCC and ACA values of these algorithms range from -0.269 to 0.155 and 0.321 to 0.582, respectively. Moreover, it shows a more balanced result in terms of precision and sensitivity (recall) for the translationally repressed data set as compared to all the other existing methods. An important aspect is that the true positive predictions are distributed preferentially at the top of the ranked list that makes MultiMiTar reliable for the biologists. MultiMiTar is now available as an online tool at www.isical.ac.in/~bioinfo_miu/multimitar.htm. MultiMiTar software can be downloaded from www.isical.ac.in/~bioinfo_miu/multimitar-download.htm.
由于缺乏阴性对照的金标准、miRNA 靶标结合位点上下文特异性相关特征和有效的特征选择过程,基于机器学习的 miRNA 靶标预测算法在灵敏度和特异性方面往往无法获得平衡的预测准确性。此外,所有基于序列、结构和机器学习的算法都无法将真正的阳性预测优先分布在排名列表的顶部;因此,这些算法对生物学家来说变得不可靠。此外,这些算法无法为在蛋白质水平上翻译抑制的靶转录本获得相当高的精度和召回率的组合。
方法/主要发现:在本文中,我们介绍了一种高效的 miRNA 靶标预测系统 MultiMiTar,这是一种基于支持向量机 (SVM) 的分类器,集成了基于多目标元启发式的特征选择技术。该方法的稳健性能主要是由于使用了高质量的阴性对照和选择了与生物学相关的 miRNA 靶标结合位点上下文特异性特征。这些特征是通过使用一种新颖的特征选择技术 AMOSA-SVM 选择的,该技术集成了多目标优化技术存档多目标模拟退火 (AMOSA) 和 SVM。
结论/意义:与其他靶标预测方法相比,MultiMiTar 在完全独立的测试数据集上实现了更高的马修相关系数 (MCC)0.583 和平均类别精度 (ACA)0.8。这些算法的获得的 MCC 和 ACA 值范围分别为-0.269 至 0.155 和 0.321 至 0.582。此外,与所有其他现有方法相比,它在翻译抑制数据集的精度和灵敏度(召回率)方面表现出更平衡的结果。一个重要方面是,真正的阳性预测优先分布在排名列表的顶部,这使得 MultiMiTar 对生物学家来说是可靠的。MultiMiTar 现在可在 www.isical.ac.in/~bioinfo_miu/multimitar.htm 上作为在线工具使用。MultiMiTar 软件可从 www.isical.ac.in/~bioinfo_miu/multimitar-download.htm 下载。