Zhang Shengli, Wang Jinyue, Li Xinjie, Liang Yunyun
School of Mathematics and Statistics, Xidian University, Xi'an, P. R. China.
School of Science, Xi'an Polytechnic University, Xi'an, P. R. China.
J Biomol Struct Dyn. 2022;40(22):12380-12391. doi: 10.1080/07391102.2021.1970628. Epub 2021 Aug 30.
N-methyladenosine (mA) is one of the most abundant forms of RNA methylation modifications currently known. It involves a wide range of biological processes, including degradation, stability, alternative splicing, etc. Therefore, the development of convenient and efficient mA prediction technologies are urgent. In this work, a novel predictor based on GBDT and stacking learning is developed to identify mA sites, which is called M6A-GSMS. To achieve accurate prediction, we explore RNA sequence information from four aspects: correlation, structure, physicochemical properties and pseudo ribonucleic acid composition. After using the GBDT algorithm for feature selection, a stacking model is constructed by combining seven basic classifiers. Compared with other state-of-the-art methods, the results show that M6A-GSMS can obtain excellent performance for identifying the mA sites. The prediction accuracy of , , , and reaches 88.4%, 60.8%, 80.5%, 92.4% and 61.8%, respectively. This method provides an effective prediction for the investigation of mA sites. In addition, all the datasets and codes are currently available at https://github.com/Wang-Jinyue/M6A-GSMS.Communicated by Ramaswamy H. Sarma.
N-甲基腺苷(mA)是目前已知的最丰富的RNA甲基化修饰形式之一。它涉及广泛的生物过程,包括降解、稳定性、可变剪接等。因此,开发方便高效的mA预测技术迫在眉睫。在这项工作中,开发了一种基于梯度提升决策树(GBDT)和堆叠学习的新型预测器来识别mA位点,称为M6A-GSMS。为了实现准确预测,我们从相关性、结构、物理化学性质和伪核糖核酸组成四个方面探索RNA序列信息。在使用GBDT算法进行特征选择后,通过组合七个基本分类器构建了一个堆叠模型。与其他现有方法相比,结果表明M6A-GSMS在识别mA位点方面可以获得优异的性能。其在[具体数据集1]、[具体数据集2]、[具体数据集3]、[具体数据集4]和[具体数据集5]上的预测准确率分别达到88.4%、60.8%、80.5%、92.4%和61.8%。该方法为mA位点的研究提供了有效的预测。此外,所有数据集和代码目前可在https://github.com/Wang-Jinyue/M6A-GSMS获取。由拉马什瓦米·H·萨尔马通讯。