Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.
Division of Medical Genetics, University of Washington, Seattle, Washington 98195, USA.
Genome Res. 2024 Nov 20;34(11):1976-1986. doi: 10.1101/gr.279095.124.
Long-read DNA sequencing has recently emerged as a powerful tool for studying both genetic and epigenetic architectures at single-molecule and single-nucleotide resolution. Long-read epigenetic studies encompass both the direct identification of native cytosine methylation and the identification of exogenously placed DNA -methyladenine (DNA-m6A). However, detecting DNA-m6A modifications using single-molecule sequencing, as well as coprocessing single-molecule genetic and epigenetic architectures, is limited by computational demands and a lack of supporting tools. Here, we introduce , a state-of-the-art toolkit that features a semisupervised convolutional neural network for fast and accurate identification of m6A-marked bases using Pacific Biosciences (PacBio) single-molecule long-read sequencing, as well as the coprocessing of long-read genetic and epigenetic data produced using either the PacBio or Oxford Nanopore Technologies (ONT) sequencing platforms. We demonstrate accurate DNA-m6A identification (>90% precision and recall) along >20 kb long DNA molecules with an ∼1000-fold improvement in speed. In addition, we demonstrate that can readily integrate genetic and epigenetic data at single-molecule resolution, including the seamless conversion between molecular and reference coordinate systems, allowing for accurate genetic and epigenetic analyses of long-read data within structurally and somatically variable genomic regions.
长读 DNA 测序最近成为一种强大的工具,可用于在单分子和单核苷酸分辨率下研究遗传和表观遗传结构。长读表观遗传研究既包括对天然胞嘧啶甲基化的直接鉴定,也包括对外源性放置的 DNA -甲基腺嘌呤(DNA-m6A)的鉴定。然而,使用单分子测序检测 DNA-m6A 修饰以及共处理单分子遗传和表观遗传结构,受到计算需求和缺乏支持工具的限制。在这里,我们介绍 ,这是一种最先进的工具包,它具有一个半监督卷积神经网络,用于使用 Pacific Biosciences (PacBio) 单分子长读测序快速准确地识别 m6A 标记碱基,以及使用 PacBio 或 Oxford Nanopore Technologies (ONT) 测序平台生成的长读遗传和表观遗传数据的共处理。我们证明了在 >20 kb 长的 DNA 分子上进行准确的 DNA-m6A 鉴定(>90%的精度和召回率),速度提高了约 1000 倍。此外,我们证明 可以轻松地在单分子分辨率下整合遗传和表观遗传数据,包括分子和参考坐标系之间的无缝转换,从而可以在结构和体细胞变异的基因组区域中对长读数据进行准确的遗传和表观遗传分析。