Zhou Jian, Bo Suling, Wang Hao, Zheng Lei, Liang Pengfei, Zuo Yongchun
State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China.
College of Computer and Information, Inner Mongolia Medical University, Hohhot, China.
Front Cell Dev Biol. 2021 Jul 16;9:707938. doi: 10.3389/fcell.2021.707938. eCollection 2021.
The 2-oxoglutarate/Fe (II)-dependent (2OG) oxygenase superfamily is mainly responsible for protein modification, nucleic acid repair and/or modification, and fatty acid metabolism and plays important roles in cancer, cardiovascular disease, and other diseases. They are likely to become new targets for the treatment of cancer and other diseases, so the accurate identification of 2OG oxygenases is of great significance. Many computational methods have been proposed to predict functional proteins to compensate for the time-consuming and expensive experimental identification. However, machine learning has not been applied to the study of 2OG oxygenases. In this study, we developed OGFE_RAAC, a prediction model to identify whether a protein is a 2OG oxygenase. To improve the performance of OGFE_RAAC, 673 amino acid reduction alphabets were used to determine the optimal feature representation scheme by recoding the protein sequence. The 10-fold cross-validation test showed that the accuracy of the model in identifying 2OG oxygenases is 91.04%. Besides, the independent dataset results also proved that the model has excellent generalization and robustness. It is expected to become an effective tool for the identification of 2OG oxygenases. With further research, we have also found that the function of 2OG oxygenases may be related to their polarity and hydrophobicity, which will help the follow-up study on the catalytic mechanism of 2OG oxygenases and the way they interact with the substrate. Based on the model we built, a user-friendly web server was established and can be friendly accessed at http://bioinfor.imu.edu.cn/ogferaac.
2-氧代戊二酸/铁(II)依赖性(2OG)加氧酶超家族主要负责蛋白质修饰、核酸修复和/或修饰以及脂肪酸代谢,在癌症、心血管疾病和其他疾病中发挥重要作用。它们很可能成为癌症和其他疾病治疗的新靶点,因此准确识别2OG加氧酶具有重要意义。已经提出了许多计算方法来预测功能蛋白,以弥补耗时且昂贵的实验鉴定。然而,机器学习尚未应用于2OG加氧酶的研究。在本研究中,我们开发了OGFE_RAAC,一种用于识别蛋白质是否为2OG加氧酶的预测模型。为了提高OGFE_RAAC的性能,通过对蛋白质序列进行重新编码,使用673个氨基酸简约字母表来确定最佳特征表示方案。10倍交叉验证测试表明,该模型识别2OG加氧酶的准确率为91.04%。此外,独立数据集结果也证明该模型具有出色的泛化能力和鲁棒性。它有望成为识别2OG加氧酶的有效工具。通过进一步研究,我们还发现2OG加氧酶的功能可能与其极性和疏水性有关,这将有助于后续对2OG加氧酶催化机制及其与底物相互作用方式的研究。基于我们构建 的模型,建立了一个用户友好的网络服务器,可通过http://bioinfor.imu.edu.cn/ogferaac进行访问。