IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1264-1273. doi: 10.1109/TCBB.2017.2670558. Epub 2017 Feb 16.
Protein methylation, an important post-translational modification, plays crucial roles in many cellular processes. The accurate prediction of protein methylation sites is fundamentally important for revealing the molecular mechanisms undergoing methylation. In recent years, computational prediction based on machine learning algorithms has emerged as a powerful and robust approach for identifying methylation sites, and much progress has been made in predictive performance improvement. However, the predictive performance of existing methods is not satisfactory in terms of overall accuracy. Motivated by this, we propose a novel random-forest-based predictor called MePred-RF, integrating several discriminative sequence-based feature descriptors and improving feature representation capability using a powerful feature selection technique. Importantly, unlike other methods based on multiple, complex information inputs, our proposed MePred-RF is based on sequence information alone. Comparative studies on benchmark datasets via vigorous jackknife tests indicate that our proposed MePred-RF method remarkably outperforms other state-of-the-art predictors, leading by a 4.5 percent average in terms of overall accuracy. A user-friendly webserver that implements the proposed method has been established for researchers' convenience, and is now freely available for public use through http://server.malab.cn/MePred-RF. We anticipate our research tool to be useful for the large-scale prediction and analysis of protein methylation sites.
蛋白质甲基化是一种重要的翻译后修饰,在许多细胞过程中发挥着关键作用。准确预测蛋白质甲基化位点对于揭示发生甲基化的分子机制至关重要。近年来,基于机器学习算法的计算预测已成为识别甲基化位点的一种强大而稳健的方法,在提高预测性能方面取得了很大进展。然而,现有方法的预测性能在整体准确性方面并不令人满意。受此启发,我们提出了一种称为 MePred-RF 的新型基于随机森林的预测器,该预测器集成了几种有鉴别力的基于序列的特征描述符,并使用强大的特征选择技术提高了特征表示能力。重要的是,与其他基于多个复杂信息输入的方法不同,我们提出的 MePred-RF 仅基于序列信息。通过强力折刀测试对基准数据集进行的比较研究表明,我们提出的 MePred-RF 方法显著优于其他最先进的预测器,总体准确率平均提高了 4.5%。为了方便研究人员,我们建立了一个用户友好的 Web 服务器来实现所提出的方法,现在可以通过 http://server.malab.cn/MePred-RF 免费供公众使用。我们预计我们的研究工具将有助于大规模预测和分析蛋白质甲基化位点。