Chen Zheng, Ni Peng, Wang Jianxin
School of Computer Science and Engineering, Central South University, Changsha 410083, China.
Xiangjiang Laboratory, Changsha 410205, China.
Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf397.
DNA methylation plays important roles in various cellular physiological processes in bacteria. Nanopore sequencing has shown the ability to identify different types of DNA methylation from individual bacteria directly. However, existing methods for identifying bacterial methylomes showed inconsistent performances in different methylation motifs in bacteria and didn't fully utilize the different scale information contained in nanopore signals.
We propose a deep-learning method, called Nanoident, for de novo detection of DNA methylation types and methylated base positions in bacteria using Nanopore sequencing. For each targeted motif sequence, Nanoident utilizes five different features, including statistical features extracted from both the nanopore raw signals and the basecalling results of the motif. All the five features are processed by a multi-scale neural network in Nanoident, which extracts information from different receptive fields of the features. The Leave-One-Out Cross-Validation (LOOCV) on the dataset containing 7 bacteria samples with 46 methylation motifs shows that Nanoident achieves ∼10% improvement in accuracy than the previous method. Furthermore, Nanoident achieves ∼13% improvement in accuracy in an independent dataset, which contains 12 methylation motifs. Additionally, we optimize the pipeline for de novo methylation motif enrichment, enabling the discovery of novel methylation motifs.
The source code of Nanoident is freely available at https://github.com/cz-csu/Nanoident and https://doi.org/10.6084/m9.figshare.29252264.
DNA甲基化在细菌的各种细胞生理过程中发挥着重要作用。纳米孔测序已显示出直接从单个细菌中识别不同类型DNA甲基化的能力。然而,现有的细菌甲基化组识别方法在细菌中不同的甲基化基序上表现不一致,并且没有充分利用纳米孔信号中包含的不同尺度信息。
我们提出了一种名为Nanoident的深度学习方法,用于使用纳米孔测序从头检测细菌中的DNA甲基化类型和甲基化碱基位置。对于每个目标基序序列,Nanoident利用五种不同的特征,包括从纳米孔原始信号和基序的碱基识别结果中提取的统计特征。所有这五种特征都由Nanoident中的多尺度神经网络进行处理,该网络从特征的不同感受野中提取信息。在包含7个细菌样本和46个甲基化基序的数据集上进行的留一法交叉验证(LOOCV)表明,Nanoident的准确率比先前的方法提高了约10%。此外,在包含12个甲基化基序的独立数据集中,Nanoident的准确率提高了约13%。此外,我们优化了从头甲基化基序富集的流程,从而能够发现新的甲基化基序。
Nanoident的源代码可在https://github.com/cz-csu/Nanoident和https://doi.org/10.6084/m9.figshare.29252264上免费获取。