Rehman Mobeen Ur, Tayara Hilal, Chong Kil To
IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):904-911. doi: 10.1109/TCBB.2022.3192572. Epub 2023 Apr 3.
N6-methyladenosine (m6A) is a common post-transcriptional alteration that plays a critical function in a variety of biological processes. Although experimental approaches for identifying m6A sites have been developed and deployed, they are currently expensive for transcriptome-wide m6A identification. Some computational strategies for identifying m6A sites have been presented as an effective complement to the experimental procedure. However, their performance still requires improvement. In this study, we have proposed a novel tool called DL-m6A for the identification of m6A sites in mammals using deep learning based on different encoding schemes. The proposed tool uses three encoding schemes which give the required contextual feature representation to the input RNA sequence. Later these contextual feature vectors individually go through several neural network layers for shallow feature extraction after which they are concatenated to a single feature vector. The concatenated feature map is then used by several other layers to extract the deep features so that the insight features of the sequence can be used for the prediction of m6A sites. The proposed tool is firstly evaluated on the tissue-specific dataset and later on a full transcript dataset. To ensure the generalizability of the tool we assessed the proposed model by training it on a full transcript dataset and test on the tissue-specific dataset. The achieved results by the proposed model have outperformed the existing tools. The results demonstrate that the proposed tool can be of great use for the biology experts and therefore a freely accessible web-server is created which can be accessed at: http://nsclbio.jbnu.ac.kr/tools/DL-m6A/.
N6-甲基腺嘌呤(m6A)是一种常见的转录后修饰,在多种生物学过程中发挥关键作用。尽管已经开发并应用了识别m6A位点的实验方法,但目前在全转录组范围内识别m6A的成本很高。一些识别m6A位点的计算策略已被提出,作为实验方法的有效补充。然而,它们的性能仍有待提高。在本研究中,我们提出了一种名为DL-m6A的新型工具,用于基于深度学习和不同编码方案识别哺乳动物中的m6A位点。该工具使用三种编码方案,为输入的RNA序列提供所需的上下文特征表示。随后,这些上下文特征向量分别经过几个神经网络层进行浅层特征提取,然后连接成一个单一特征向量。连接后的特征图再由其他几层用于提取深度特征,以便将序列的洞察特征用于预测m6A位点。该工具首先在组织特异性数据集上进行评估,随后在完整转录本数据集上进行评估。为确保该工具的通用性,我们通过在完整转录本数据集上训练并在组织特异性数据集上测试来评估所提出的模型。所提出模型取得的结果优于现有工具。结果表明,该工具对生物学专家具有很大的用途,因此创建了一个可免费访问的网络服务器,可通过以下网址访问:http://nsclbio.jbnu.ac.kr/tools/DL-m6A/ 。