School of Information Engineering, Shaoyang University, Shaoyang, 42200, China.
College of Information and Intelligence, Hunan Agricultural University, Changsha, 410128, China.
BMC Bioinformatics. 2023 Jan 18;24(1):21. doi: 10.1186/s12859-023-05135-0.
N4-methylcytosine (4mC) is an important epigenetic mechanism, which regulates many cellular processes such as cell differentiation and gene expression. The knowledge about the 4mC sites is a key foundation to exploring its roles. Due to the limitation of techniques, precise detection of 4mC is still a challenging task. In this paper, we presented a multi-scale convolution neural network (CNN) and adaptive embedding-based computational method for predicting 4mC sites in mouse genome, which was referred to as MultiScale-CNN-4mCPred. The MultiScale-CNN-4mCPred used adaptive embedding to encode nucleotides, and then utilized multi-scale CNNs as well as long short-term memory to extract more in-depth local properties and contextual semantics in the sequences. The MultiScale-CNN-4mCPred is an end-to-end learning method, which requires no sophisticated feature design. The MultiScale-CNN-4mCPred reached an accuracy of 81.66% in the 10-fold cross-validation, and an accuracy of 84.69% in the independent test, outperforming state-of-the-art methods. We implemented the proposed method into a user-friendly web application which is freely available at: http://www.biolscience.cn/MultiScale-CNN-4mCPred/ .
N4-甲基胞嘧啶(4mC)是一种重要的表观遗传机制,它调节细胞分化和基因表达等许多细胞过程。对 4mC 位点的了解是探索其作用的关键基础。由于技术的限制,精确检测 4mC 仍然是一项具有挑战性的任务。在本文中,我们提出了一种用于预测小鼠基因组中 4mC 位点的多尺度卷积神经网络(CNN)和自适应嵌入计算方法,称为 MultiScale-CNN-4mCPred。MultiScale-CNN-4mCPred 使用自适应嵌入对核苷酸进行编码,然后利用多尺度 CNN 和长短时记忆提取序列中更深入的局部特性和上下文语义。MultiScale-CNN-4mCPred 是一种端到端的学习方法,不需要复杂的特征设计。MultiScale-CNN-4mCPred 在 10 折交叉验证中的准确率达到 81.66%,在独立测试中的准确率达到 84.69%,优于最先进的方法。我们将所提出的方法实现到一个用户友好的网络应用程序中,该程序可在以下网址免费获得:http://www.biolscience.cn/MultiScale-CNN-4mCPred/ 。