Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore.
Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia.
Nat Commun. 2024 Jul 3;15(1):5580. doi: 10.1038/s41467-024-49847-0.
DNA methylation plays an important role in various biological processes, including cell differentiation, ageing, and cancer development. The most important methylation in mammals is 5-methylcytosine mostly occurring in the context of CpG dinucleotides. Sequencing methods such as whole-genome bisulfite sequencing successfully detect 5-methylcytosine DNA modifications. However, they suffer from the serious drawbacks of short read lengths and might introduce an amplification bias. Here we present Rockfish, a deep learning algorithm that significantly improves read-level 5-methylcytosine detection by using Nanopore sequencing. Rockfish is compared with other methods based on Nanopore sequencing on R9.4.1 and R10.4.1 datasets. There is an increase in the single-base accuracy and the F1 measure of up to 5 percentage points on R.9.4.1 datasets, and up to 0.82 percentage points on R10.4.1 datasets. Moreover, Rockfish shows a high correlation with whole-genome bisulfite sequencing, requires lower read depth, and achieves higher confidence in biologically important regions such as CpG-rich promoters while being computationally efficient. Its superior performance in human and mouse samples highlights its versatility for studying 5-methylcytosine methylation across varied organisms and diseases. Finally, its adaptable architecture ensures compatibility with new versions of pores and chemistry as well as modification types.
DNA 甲基化在各种生物过程中起着重要作用,包括细胞分化、衰老和癌症发展。哺乳动物中最重要的甲基化是 5-甲基胞嘧啶,主要发生在 CpG 二核苷酸的情况下。测序方法,如全基因组亚硫酸氢盐测序,成功地检测到 5-甲基胞嘧啶 DNA 修饰。然而,它们存在着短读长和可能引入扩增偏差的严重缺点。在这里,我们提出了 Rockfish,这是一种深度学习算法,通过使用纳米孔测序显著提高了读长水平的 5-甲基胞嘧啶检测。Rockfish 与基于纳米孔测序的其他方法在 R9.4.1 和 R10.4.1 数据集上进行了比较。在 R9.4.1 数据集上,单碱基准确率和 F1 度量提高了 5 个百分点,在 R10.4.1 数据集上提高了 0.82 个百分点。此外,Rockfish 与全基因组亚硫酸氢盐测序高度相关,需要更低的读深,并且在 CpG 丰富的启动子等生物学重要区域具有更高的置信度,同时计算效率高。它在人类和小鼠样本中的优越性能突出了其在研究不同生物体和疾病中 5-甲基胞嘧啶甲基化的通用性。最后,其适应性架构确保了与新的孔和化学以及修饰类型的兼容性。