Lv Hao, Dao Fu-Ying, Zhang Dan, Guan Zheng-Xing, Yang Hui, Su Wei, Liu Meng-Lu, Ding Hui, Chen Wei, Lin Hao
Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
Center for Genomics and Computational Biology, School of Life Sciences, North China University of Science and Technology, Tangshan 063000, China; Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.
iScience. 2020 Apr 24;23(4):100991. doi: 10.1016/j.isci.2020.100991. Epub 2020 Mar 19.
5hmC, 6mA, and 4mC are three common DNA modifications and are involved in various of biological processes. Accurate genome-wide identification of these sites is invaluable for better understanding their biological functions. Owing to the labor-intensive and expensive nature of experimental methods, it is urgent to develop computational methods for the genome-wide detection of these sites. Keeping this in mind, the current study was devoted to construct a computational method to identify 5hmC, 6mA, and 4mC. We initially used K-tuple nucleotide component, nucleotide chemical property and nucleotide frequency, and mono-nucleotide binary encoding scheme to formulate samples. Subsequently, random forest was utilized to identify 5hmC, 6mA, and 4mC sites. Cross-validated results showed that the proposed method could produce the excellent generalization ability in the identification of the three modification sites. Based on the proposed model, a web-server called iDNA-MS was established and is freely accessible at http://lin-group.cn/server/iDNA-MS.
5-羟甲基胞嘧啶(5hmC)、6-甲基腺嘌呤(6mA)和4-甲基胞嘧啶(4mC)是三种常见的DNA修饰,参与多种生物学过程。对这些位点进行全基因组范围的准确识别对于更好地理解其生物学功能具有重要价值。由于实验方法劳动强度大且成本高昂,因此迫切需要开发用于全基因组检测这些位点的计算方法。考虑到这一点,当前的研究致力于构建一种计算方法来识别5hmC、6mA和4mC。我们最初使用K元核苷酸组分、核苷酸化学性质和核苷酸频率以及单核苷酸二进制编码方案来构建样本。随后,利用随机森林来识别5hmC、6mA和4mC位点。交叉验证结果表明,所提出的方法在识别这三种修饰位点时能够产生出色的泛化能力。基于所提出的模型,建立了一个名为iDNA-MS的网络服务器,可通过http://lin-group.cn/server/iDNA-MS免费访问。