Tsukiyama Sho, Hasan Md Mehedi, Kurata Hiroyuki
Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
Tulane Center for Aging and Department of Medicine, Tulane University Health Sciences Center, New Orleans, LA 70112, USA.
Comput Struct Biotechnol J. 2022 Dec 28;21:644-654. doi: 10.1016/j.csbj.2022.12.043. eCollection 2023.
N6-methyladenine (6mA) plays a critical role in various epigenetic processing including DNA replication, DNA repair, silencing, transcription, and diseases such as cancer. To understand such epigenetic mechanisms, 6 mA has been detected by high-throughput technologies on a genome-wide scale at single-base resolution, together with conventional methods such as immunoprecipitation, mass spectrometry and capillary electrophoresis, but these experimental approaches are time-consuming and laborious. To complement these problems, we have developed a CNN-based 6 mA site predictor, named CNN6mA, which proposed two new architectures: a position-specific 1-D convolutional layer and a cross-interactive network. In the position-specific 1-D convolutional layer, position-specific filters with different window sizes were applied to an inquiry sequence instead of sharing the same filters over all positions in order to extract the position-specific features at different levels. The cross-interactive network explored the relationships between all the nucleotide patterns within the inquiry sequence. Consequently, CNN6mA outperformed the existing state-of-the-art models in many species and created the contribution score vector that intelligibly interpret the prediction mechanism. The source codes and web application in CNN6mA are freely accessible at https://github.com/kuratahiroyuki/CNN6mA.git and http://kurata35.bio.kyutech.ac.jp/CNN6mA/, respectively.
N6-甲基腺嘌呤(6mA)在包括DNA复制、DNA修复、沉默、转录等多种表观遗传过程以及癌症等疾病中发挥着关键作用。为了理解这些表观遗传机制,人们已经通过高通量技术在全基因组范围内以单碱基分辨率检测6mA,同时也采用了免疫沉淀、质谱和毛细管电泳等传统方法,但这些实验方法既耗时又费力。为了弥补这些问题,我们开发了一种基于卷积神经网络(CNN)的6mA位点预测器,名为CNN6mA,它提出了两种新的架构:位置特异性一维卷积层和交叉交互网络。在位置特异性一维卷积层中,将具有不同窗口大小的位置特异性滤波器应用于查询序列,而不是在所有位置共享相同的滤波器,以便在不同层次上提取位置特异性特征。交叉交互网络探索了查询序列中所有核苷酸模式之间的关系。因此,CNN6mA在许多物种中优于现有的最先进模型,并创建了能够清晰解释预测机制的贡献得分向量。CNN6mA的源代码和网络应用程序分别可在https://github.com/kuratahiroyuki/CNN6mA.git和http://kurata35.bio.kyutech.ac.jp/CNN6mA/上免费获取。