Manavalan Balachandran, Hasan Md Mehedi, Basith Shaherin, Gosu Vijayakumar, Shin Tae-Hwan, Lee Gwang
Department of Physiology, Ajou University School of Medicine, Suwon 16499, Republic of Korea.
Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka 820-8502, Japan.
Mol Ther Nucleic Acids. 2020 Sep 16;22:406-420. doi: 10.1016/j.omtn.2020.09.010. eCollection 2020 Dec 4.
DNA -methylcytosine (4mC) is a crucial epigenetic modification involved in various biological processes. Accurate genome-wide identification of these sites is critical for improving our understanding of their biological functions and mechanisms. As experimental methods for 4mC identification are tedious, expensive, and labor-intensive, several machine learning-based approaches have been developed for genome-wide detection of such sites in multiple species. However, the predictions projected by these tools are difficult to quantify and compare. To date, no systematic performance comparison of 4mC tools has been reported. The aim of this study was to compare and critically evaluate 12 publicly available 4mC site prediction tools according to species specificity, based on a huge independent validation dataset. The tools 4mCCNN (), DNA4mC-LIP (), iDNA-MS (), DNA4mC-LIP and 4mCCNN (), and four tools for achieved excellent overall performance compared with their counterparts. However, none of the existing methods was suitable for , , and , thereby limiting their practical applicability. Model transferability to five species and non-transferability to three species are also discussed. The presented evaluation will assist researchers in selecting appropriate prediction tools that best suit their purpose and provide useful guidelines for the development of improved 4mC predictors in the future.
DNA甲基胞嘧啶(4mC)是一种参与多种生物过程的关键表观遗传修饰。对这些位点进行全基因组范围的准确识别,对于增进我们对其生物学功能和机制的理解至关重要。由于用于4mC识别的实验方法繁琐、昂贵且耗费人力,因此已开发出几种基于机器学习的方法来在多个物种中进行全基因组范围的此类位点检测。然而,这些工具所做出的预测难以进行量化和比较。迄今为止,尚未有关于4mC工具的系统性能比较的报道。本研究的目的是基于一个庞大的独立验证数据集,根据物种特异性对12种公开可用的4mC位点预测工具进行比较和批判性评估。与其他同类工具相比,工具4mCCNN()、DNA4mC-LIP()、iDNA-MS()、DNA4mC-LIP和4mCCNN()以及四种用于 的工具实现了出色的整体性能。然而,现有的方法均不适用于 、 和 ,从而限制了它们的实际适用性。还讨论了模型对五个物种的可转移性以及对三个物种的不可转移性。所呈现的评估将帮助研究人员选择最适合其目的的合适预测工具,并为未来改进4mC预测器的开发提供有用的指导方针。