School of Computer Science and Technology, Xidian University, Xi'an, China; Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.
Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.
Methods. 2022 Jul;203:32-39. doi: 10.1016/j.ymeth.2021.05.016. Epub 2021 May 24.
N2-methylguanosine is a post-transcriptional modification of RNA that is found in eukaryotes and archaea. The biological function of m2G modification discovered so far is to control and stabilize the three-dimensional structure of tRNA and the dynamic barrier of reverse transcription. To discover additional biological functions of m2G, it is necessary to develop time-saving and labor-saving calculation tools to identify m2G. In this paper, based on hybrid features and a random forest, a novel predictor, RFhy-m2G, was developed to identify the m2G modification sites for three species. The hybrid feature used by the predictor is used to fuse the three features of ENAC, PseDNC, and NPPS. These three features include primary sequence derivation properties, physicochemical properties, and position-specific properties. Since there are redundant features in hybrid features, MRMD2.0 is used for optimal feature selection. Through feature analysis, it is found that the optimal hybrid features obtained still contain three kinds of properties, and the hybrid features can more accurately identify m2G modification sites and improve prediction performance. Based on five-fold cross-validation and independent testing to evaluate the prediction model, the accuracies obtained were 0.9982 and 0.9417, respectively. The robustness of the predictor is demonstrated by comparisons with other predictors.
N2-甲基鸟苷是一种在真核生物和古菌中发现的 RNA 转录后修饰。到目前为止,m2G 修饰的生物学功能被发现可以控制和稳定 tRNA 的三维结构和反转录的动态障碍。为了发现 m2G 的其他生物学功能,有必要开发省时省力的计算工具来识别 m2G。在本文中,基于混合特征和随机森林,开发了一种新的预测器 RFhy-m2G,用于识别三个物种的 m2G 修饰位点。预测器使用的混合特征用于融合 ENAC、PseDNC 和 NPPS 的三个特征。这三个特征包括一级序列推导特性、物理化学特性和位置特异性特性。由于混合特征中存在冗余特征,因此使用 MRMD2.0 进行最优特征选择。通过特征分析,发现获得的最优混合特征仍然包含三种性质,并且混合特征可以更准确地识别 m2G 修饰位点并提高预测性能。通过五重交叉验证和独立测试对预测模型进行评估,分别获得了 0.9982 和 0.9417 的准确率。通过与其他预测器的比较,证明了预测器的稳健性。