Peters Timothy J, Buckley Michael J, Chen Yunshun, Smyth Gordon K, Goodnow Christopher C, Clark Susan J
The Garvan Institute of Medical Research, 384 Victoria St, Darlinghurst, NSW 2010, Australia.
UNSW Sydney, Sydney 2052, Australia.
Nucleic Acids Res. 2021 Nov 8;49(19):e109. doi: 10.1093/nar/gkab637.
Whole genome bisulphite sequencing (WGBS) permits the genome-wide study of single molecule methylation patterns. One of the key goals of mammalian cell-type identity studies, in both normal differentiation and disease, is to locate differential methylation patterns across the genome. We discuss the most desirable characteristics for DML (differentially methylated locus) and DMR (differentially methylated region) detection tools in a genome-wide context and choose a set of statistical methods that fully or partially satisfy these considerations to compare for benchmarking. Our data simulation strategy is both biologically informed-employing distribution parameters derived from large-scale consortium datasets-and thorough. We report DML detection ability with respect to coverage, group methylation difference, sample size, variability and covariate size, both marginally and jointly, and exhaustively with respect to parameter combination. We also benchmark these methods on FDR control and computational time. We use this result to backend and introduce an expanded version of DMRcate: an existing DMR detection tool for microarray data that we have extended to now call DMRs from WGBS data. We compare DMRcate to a set of alternative DMR callers using a similarly realistic simulation strategy. We find DMRcate and RADmeth are the best predictors of DMRs, and conclusively find DMRcate the fastest.
全基因组亚硫酸氢盐测序(WGBS)能够对单分子甲基化模式进行全基因组研究。在正常分化和疾病状态下,哺乳动物细胞类型识别研究的关键目标之一是定位全基因组范围内的差异甲基化模式。我们讨论了在全基因组背景下用于检测差异甲基化位点(DML)和差异甲基化区域(DMR)的工具的最理想特征,并选择了一组完全或部分满足这些考量的统计方法进行比较以作基准测试。我们的数据模拟策略既基于生物学知识——采用从大规模联盟数据集得出的分布参数——又全面彻底。我们报告了DML在覆盖度、组间甲基化差异、样本量、变异性和协变量大小方面的检测能力,包括边际检测能力和联合检测能力,并详尽地报告了参数组合方面的检测能力。我们还对这些方法在错误发现率控制和计算时间方面进行了基准测试。我们利用这些结果进行后续工作,并引入了DMRcate的扩展版本:一种现有的用于微阵列数据的DMR检测工具,我们已将其扩展为现在可从WGBS数据中调用DMR。我们使用类似的现实模拟策略将DMRcate与一组替代的DMR调用工具进行比较。我们发现DMRcate和RADmeth是DMR的最佳预测工具,并最终确定DMRcate速度最快。