Abbas Zeeshan, Tayara Hilal, Zou Quan, Chong Kil To
Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea.
Institute of Avionics and Aeronautics (IAA), Air University, Islamabad 44000, Pakistan.
Comput Struct Biotechnol J. 2021 Aug 10;19:4619-4625. doi: 10.1016/j.csbj.2021.08.014. eCollection 2021.
The most communal post-transcriptional modification, N6-methyladenosine (m6A), is associated with a number of crucial biological processes. The precise detection of m6A sites around the genome is critical for revealing its regulatory function and providing new insights into drug design. Although both experimental and computational models for detecting m6A sites have been introduced, but these conventional methods are laborious and expensive. Furthermore, only a handful of these models are capable of detecting m6A sites in various tissues. Therefore, a more generic and optimized computational method for detecting m6A sites in different tissues is required. In this paper, we proposed a universal model using a deep neural network (DNN) and named it TS-m6A-DL, which can classify m6A sites in several tissues of humans (), mice (), and rats (). To extract RNA sequence features and to convert the input into numerical format for the network, we utilized one-hot-encoding method. The model was tested using fivefold cross-validation and its stability was measured using independent datasets. The proposed model, TS-m6A-DL, achieved accuracies in the range of 75-85% using the fivefold cross-validation method and 72-84% on the independent datasets. Finally, to authenticate the generalization of the model, we performed cross-species testing and proved the generalization ability by achieving state-of-the-art results.
最常见的转录后修饰——N6-甲基腺苷(m6A)与许多关键的生物学过程相关。精确检测基因组周围的m6A位点对于揭示其调控功能以及为药物设计提供新见解至关重要。尽管已经介绍了检测m6A位点的实验模型和计算模型,但这些传统方法既费力又昂贵。此外,这些模型中只有少数能够检测不同组织中的m6A位点。因此,需要一种更通用、更优化的计算方法来检测不同组织中的m6A位点。在本文中,我们提出了一种使用深度神经网络(DNN)的通用模型,并将其命名为TS-m6A-DL,它可以对人类、小鼠和大鼠的多种组织中的m6A位点进行分类。为了提取RNA序列特征并将输入转换为网络的数字格式,我们使用了独热编码方法。该模型使用五折交叉验证进行测试,并使用独立数据集测量其稳定性。所提出的模型TS-m6A-DL,使用五折交叉验证方法的准确率在75%至85%之间,在独立数据集上的准确率在72%至84%之间。最后,为了验证模型的泛化能力,我们进行了跨物种测试,并通过取得领先成果证明了泛化能力。