Bioinformatics Institute, Seoul National University, Seoul, Republic of Korea.
BK21 FOUR Intelligence Computing, Seoul National University, Seoul, Republic of Korea.
PLoS Comput Biol. 2023 Mar 20;19(3):e1010946. doi: 10.1371/journal.pcbi.1010946. eCollection 2023 Mar.
Phased DNA methylation states within bisulfite sequencing reads are valuable source of information that can be used to estimate epigenetic diversity across cells as well as epigenomic instability in individual cells. Various measures capturing the heterogeneity of DNA methylation states have been proposed for a decade. However, in routine analyses on DNA methylation, this heterogeneity is often ignored by computing average methylation levels at CpG sites, even though such information exists in bisulfite sequencing data in the form of phased methylation states, or methylation patterns. In this study, to facilitate the application of the DNA methylation heterogeneity measures in downstream epigenomic analyses, we present a Rust-based, extremely fast and lightweight bioinformatics toolkit called Metheor. As the analysis of DNA methylation heterogeneity requires the examination of pairs or groups of CpGs throughout the genome, existing softwares suffer from high computational burden, which almost make a large-scale DNA methylation heterogeneity studies intractable for researchers with limited resources. In this study, we benchmark the performance of Metheor against existing code implementations for DNA methylation heterogeneity measures in three different scenarios of simulated bisulfite sequencing datasets. Metheor was shown to dramatically reduce the execution time up to 300-fold and memory footprint up to 60-fold, while producing identical results with the original implementation, thereby facilitating a large-scale study of DNA methylation heterogeneity profiles. To demonstrate the utility of the low computational burden of Metheor, we show that the methylation heterogeneity profiles of 928 cancer cell lines can be computed with standard computing resources. With those profiles, we reveal the association between DNA methylation heterogeneity and various omics features. Source code for Metheor is at https://github.com/dohlee/metheor and is freely available under the GPL-3.0 license.
亚硫酸氢盐测序读取中的阶段性 DNA 甲基化状态是有价值的信息来源,可用于估计细胞间的表观遗传多样性以及单个细胞中的表观基因组不稳定性。十年来,已经提出了各种捕获 DNA 甲基化状态异质性的措施。然而,在 DNA 甲基化的常规分析中,通过计算 CpG 位点的平均甲基化水平,通常会忽略这种异质性,尽管这种信息以阶段性甲基化状态或甲基化模式的形式存在于亚硫酸氢盐测序数据中。在这项研究中,为了促进 DNA 甲基化异质性测量在下游表观基因组分析中的应用,我们提出了一个基于 Rust 的、极其快速和轻量级的生物信息学工具包,称为 Metheor。由于 DNA 甲基化异质性的分析需要检查整个基因组中的成对或成组的 CpG,现有的软件在计算上负担过重,这使得资源有限的研究人员几乎无法进行大规模的 DNA 甲基化异质性研究。在这项研究中,我们在三种不同的模拟亚硫酸氢盐测序数据集的场景下,将 Metheor 的性能与现有的 DNA 甲基化异质性测量代码实现进行了基准测试。结果表明,Metheor 可以将执行时间减少 300 倍,内存占用减少 60 倍,同时与原始实现产生相同的结果,从而便于进行大规模的 DNA 甲基化异质性研究。为了展示 Metheor 低计算负担的实用性,我们展示了可以使用标准计算资源计算 928 个癌细胞系的甲基化异质性图谱。通过这些图谱,我们揭示了 DNA 甲基化异质性与各种组学特征之间的关联。Metheor 的源代码位于 https://github.com/dohlee/metheor,并在 GPL-3.0 许可证下免费提供。