Wang Rulan, Chung Chia-Ru, Huang Hsien-Da, Lee Tzong-Yi
School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China.
Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China.
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbac573.
N6-methyladinosine (m6A) modification is the most abundant co-transcriptional modification in eukaryotic RNA and plays important roles in cellular regulation. Traditional high-throughput sequencing experiments used to explore functional mechanisms are time-consuming and labor-intensive, and most of the proposed methods focused on limited species types. To further understand the relevant biological mechanisms among different species with the same RNA modification, it is necessary to develop a computational scheme that can be applied to different species. To achieve this, we proposed an attention-based deep learning method, adaptive-m6A, which consists of convolutional neural network, bi-directional long short-term memory and an attention mechanism, to identify m6A sites in multiple species. In addition, three conventional machine learning (ML) methods, including support vector machine, random forest and logistic regression classifiers, were considered in this work. In addition to the performance of ML methods for multi-species prediction, the optimal performance of adaptive-m6A yielded an accuracy of 0.9832 and the area under the receiver operating characteristic curve of 0.98. Moreover, the motif analysis and cross-validation among different species were conducted to test the robustness of one model towards multiple species, which helped improve our understanding about the sequence characteristics and biological functions of RNA modifications in different species.
N6-甲基腺苷(m6A)修饰是真核RNA中最丰富的共转录修饰,在细胞调控中发挥重要作用。用于探索功能机制的传统高通量测序实验既耗时又费力,而且大多数提出的方法都集中在有限的物种类型上。为了进一步了解具有相同RNA修饰的不同物种之间的相关生物学机制,有必要开发一种可应用于不同物种的计算方案。为了实现这一目标,我们提出了一种基于注意力的深度学习方法——自适应m6A,它由卷积神经网络、双向长短期记忆和注意力机制组成,用于识别多个物种中的m6A位点。此外,这项工作还考虑了三种传统的机器学习(ML)方法,包括支持向量机、随机森林和逻辑回归分类器。除了ML方法在多物种预测方面的性能外,自适应m6A的最佳性能产生了0.9832的准确率和0.98的受试者工作特征曲线下面积。此外,还进行了基序分析和不同物种间的交叉验证,以测试一个模型对多个物种的稳健性,这有助于提高我们对不同物种中RNA修饰的序列特征和生物学功能的理解。