School of Information and Communication Engineering, Hainan University, Haikou, Hainan, China.
Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou, Hainan, China.
PLoS One. 2024 Oct 31;19(10):e0301791. doi: 10.1371/journal.pone.0301791. eCollection 2024.
In this study, from the perspective of image processing, we propose the iDNA-ITLM model, using a novel data enhance strategy by continuously self-replicating a short DNA sequence into a longer DNA sequence and then embedding it into a high-dimensional matrix to enlarge the receptive field, for identifying DNA methylation sites. Our model consistently outperforms the current state-of-the-art sequence-based DNA methylation site recognition methods when evaluated on 17 benchmark datasets that cover multiple species and include three DNA methylation modifications (4mC, 5hmC, and 6mA). The experimental results demonstrate the robustness and superior performance of our model across these datasets. In addition, our model can transfer learning to RNA methylation sequences and produce good results without modifying the hyperparameters in the model. The proposed iDNA-ITLM model can be considered a universal predictor across DNA and RNA methylation species.
在这项研究中,我们从图像处理的角度出发,提出了 iDNA-ITLM 模型,该模型采用了一种新颖的数据增强策略,通过将短 DNA 序列不断自我复制成长 DNA 序列,然后将其嵌入到高维矩阵中以扩大感受野,用于识别 DNA 甲基化位点。在评估涵盖多个物种并包含三种 DNA 甲基化修饰(4mC、5hmC 和 6mA)的 17 个基准数据集时,我们的模型在当前基于序列的 DNA 甲基化位点识别方法中始终表现出色。实验结果证明了我们的模型在这些数据集上的稳健性和卓越性能。此外,我们的模型可以将迁移学习应用于 RNA 甲基化序列,并在不修改模型中超参数的情况下产生良好的结果。所提出的 iDNA-ITLM 模型可以被视为跨 DNA 和 RNA 甲基化物种的通用预测器。