Zhang Ying, Ge Fang, Li Fuyi, Yang Xibei, Song Jiangning, Yu Dong-Jun
IEEE/ACM Trans Comput Biol Bioinform. 2023 Sep-Oct;20(5):3205-3214. doi: 10.1109/TCBB.2023.3283985. Epub 2023 Oct 9.
It has been demonstrated that RNA modifications play essential roles in multiple biological processes. Accurate identification of RNA modifications in the transcriptome is critical for providing insights into the biological functions and mechanisms. Many tools have been developed for predicting RNA modifications at single-base resolution, which employ conventional feature engineering methods that focus on feature design and feature selection processes that require extensive biological expertise and may introduce redundant information. With the rapid development of artificial intelligence technologies, end-to-end methods are favorably received by researchers. Nevertheless, each well-trained model is only suitable for a specific RNA methylation modification type for nearly all of these approaches. In this study, we present MRM-BERT by feeding task-specific sequences into the powerful BERT (Bidirectional Encoder Representations from Transformers) model and implementing fine-tuning, which exhibits competitive performance to the state-of-the-art methods. MRM-BERT avoids repeated de novo training of the model and can predict multiple RNA modifications such as pseudouridine, m6A, m5C, and m1A in Mus musculus, Arabidopsis thaliana, and Saccharomyces cerevisiae. In addition, we analyse the attention heads to provide high attention regions for the prediction, and conduct saturated in silico mutagenesis of the input sequences to discover potential changes of RNA modifications, which can better assist researchers in their follow-up research.
已经证明RNA修饰在多个生物学过程中发挥着重要作用。准确识别转录组中的RNA修饰对于深入了解生物学功能和机制至关重要。已经开发了许多用于在单碱基分辨率下预测RNA修饰的工具,这些工具采用传统的特征工程方法,专注于特征设计和特征选择过程,这需要广泛的生物学专业知识,并且可能引入冗余信息。随着人工智能技术的快速发展,端到端方法受到了研究人员的青睐。然而,几乎所有这些方法中,每个经过良好训练的模型仅适用于特定的RNA甲基化修饰类型。在本研究中,我们通过将特定任务序列输入强大的BERT(来自Transformer的双向编码器表示)模型并进行微调,提出了MRM-BERT,它展示出与最先进方法相竞争的性能。MRM-BERT避免了对模型进行重复的从头训练,并且可以预测小家鼠、拟南芥和酿酒酵母中的多种RNA修饰,如假尿苷、m6A、m5C和m1A。此外,我们分析了注意力头以提供预测的高关注区域,并对输入序列进行饱和的计算机诱变以发现RNA修饰的潜在变化,这可以更好地协助研究人员进行后续研究。