College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.
Polytech Nantes, Bâtiment Ireste, 44300 Nantes, France.
Genes (Basel). 2021 Feb 28;12(3):354. doi: 10.3390/genes12030354.
As a prevalent existing post-transcriptional modification of RNA, N6-methyladenosine (m6A) plays a crucial role in various biological processes. To better radically reveal its regulatory mechanism and provide new insights for drug design, the accurate identification of m6A sites in genome-wide is vital. As the traditional experimental methods are time-consuming and cost-prohibitive, it is necessary to design a more efficient computational method to detect the m6A sites. In this study, we propose a novel cross-species computational method DNN-m6A based on the deep neural network (DNN) to identify m6A sites in multiple tissues of human, mouse and rat. Firstly, binary encoding (BE), tri-nucleotide composition (TNC), enhanced nucleic acid composition (ENAC), -spaced nucleotide pair frequencies (KSNPFs), nucleotide chemical property (NCP), pseudo dinucleotide composition (PseDNC), position-specific nucleotide propensity (PSNP) and position-specific dinucleotide propensity (PSDP) are employed to extract RNA sequence features which are subsequently fused to construct the initial feature vector set. Secondly, we use elastic net to eliminate redundant features while building the optimal feature subset. Finally, the hyper-parameters of DNN are tuned with Bayesian hyper-parameter optimization based on the selected feature subset. The five-fold cross-validation test on training datasets show that the proposed DNN-m6A method outperformed the state-of-the-art method for predicting m6A sites, with an accuracy (ACC) of 73.58%-83.38% and an area under the curve (AUC) of 81.39%-91.04%. Furthermore, the independent datasets achieved an ACC of 72.95%-83.04% and an AUC of 80.79%-91.09%, which shows an excellent generalization ability of our proposed method.
作为 RNA 转录后修饰的一种普遍存在形式,N6-甲基腺苷(m6A)在各种生物过程中发挥着关键作用。为了更好地揭示其调控机制,并为药物设计提供新的见解,在全基因组范围内准确识别 m6A 位点至关重要。由于传统的实验方法既耗时又昂贵,因此有必要设计一种更有效的计算方法来检测 m6A 位点。在这项研究中,我们提出了一种基于深度神经网络(DNN)的新的跨物种计算方法 DNN-m6A,用于识别人类、小鼠和大鼠多种组织中的 m6A 位点。首先,采用二进制编码(BE)、三核苷酸组成(TNC)、增强核酸组成(ENAC)、-间隔核苷酸对频率(KSNPFs)、核苷酸化学性质(NCP)、伪二核苷酸组成(PseDNC)、位置特异性核苷酸倾向性(PSNP)和位置特异性二核苷酸倾向性(PSDP)提取 RNA 序列特征,随后融合构建初始特征向量集。其次,使用弹性网络消除冗余特征,同时构建最优特征子集。最后,基于选定的特征子集,使用贝叶斯超参数优化调整 DNN 的超参数。在训练数据集上的五折交叉验证测试表明,所提出的 DNN-m6A 方法在预测 m6A 位点方面优于最先进的方法,准确率(ACC)为 73.58%-83.38%,曲线下面积(AUC)为 81.39%-91.04%。此外,独立数据集的 ACC 为 72.95%-83.04%,AUC 为 80.79%-91.09%,这表明我们提出的方法具有优异的泛化能力。