Bi Yue, Xiang Dongxu, Ge Zongyuan, Li Fuyi, Jia Cangzhi, Song Jiangning
School of Science, Dalian Maritime University, Dalian 116026, China.
Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.
Mol Ther Nucleic Acids. 2020 Aug 25;22:362-372. doi: 10.1016/j.omtn.2020.08.022. eCollection 2020 Dec 4.
Recent studies have increasingly shown that the chemical modification of mRNA plays an important role in the regulation of gene expression. N-methylguanosine (m7G) is a type of positively-charged mRNA modification that plays an essential role for efficient gene expression and cell viability. However, the research on m7G has received little attention to date. Bioinformatics tools can be applied as auxiliary methods to identify m7G sites in transcriptomes. In this study, we develop a novel interpretable machine learning-based approach termed XG-m7G for the differentiation of m7G sites using the XGBoost algorithm and six different types of sequence-encoding schemes. Both 10-fold and jackknife cross-validation tests indicate that XG-m7G outperforms iRNA-m7G. Moreover, using the powerful SHAP algorithm, this new framework also provides desirable interpretations of the model performance and highlights the most important features for identifying m7G sites. XG-m7G is anticipated to serve as a useful tool and guide for researchers in their future studies of mRNA modification sites.
最近的研究越来越多地表明,mRNA的化学修饰在基因表达调控中起着重要作用。N-甲基鸟苷(m7G)是一种带正电荷的mRNA修饰,对有效的基因表达和细胞活力起着至关重要的作用。然而,迄今为止,对m7G的研究很少受到关注。生物信息学工具可以作为辅助方法来识别转录组中的m7G位点。在本研究中,我们开发了一种基于可解释机器学习的新方法,称为XG-m7G,用于使用XGBoost算法和六种不同类型的序列编码方案来区分m7G位点。10倍交叉验证测试和留一法交叉验证测试均表明,XG-m7G优于iRNA-m7G。此外,使用强大的SHAP算法,这个新框架还对模型性能提供了理想的解释,并突出了识别m7G位点的最重要特征。预计XG-m7G将成为研究人员未来研究mRNA修饰位点的有用工具和指南。