Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, Ohio, USA.
Department of Biostatistics, University of Michigan, Ann Arbor, 48109, Michigan, USA.
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad139.
RNA methylation has emerged recently as an active research domain to study post-transcriptional alteration in gene expression regulation. Various types of RNA methylation, including N6-methyladenosine (m6A), are involved in human disease development. As a newly developed sequencing biotechnology to quantify the m6A level on a transcriptome-wide scale, MeRIP-seq expands RNA epigenetics study in both basic and clinical applications, with an upward trend. One of the fundamental questions in RNA methylation data analysis is to identify the Differentially Methylated Regions (DMRs), by contrasting cases and controls. Multiple statistical approaches have been recently developed for DMR detection, but there is a lack of a comprehensive evaluation for these analytical methods. Here, we thoroughly assess all eight existing methods for DMR calling, using both synthetic and real data. Our simulation adopts a Gamma-Poisson model and logit linear framework, and accommodates various sample sizes and DMR proportions for benchmarking. For all methods, low sensitivities are observed among regions with low input levels, but they can be drastically boosted by an increase in sample size. TRESS and exomePeak2 perform the best using metrics of detection precision, FDR, type I error control and runtime, though hampered by low sensitivity. DRME and exomePeak obtain high sensitivities, at the expense of inflated FDR and type I error. Analyses on three real datasets suggest differential preference on identified DMR length and uniquely discovered regions, between these methods.
RNA 甲基化最近成为一个活跃的研究领域,用于研究基因表达调控中的转录后改变。各种类型的 RNA 甲基化,包括 N6-甲基腺苷(m6A),都参与了人类疾病的发展。MeRIP-seq 是一种新开发的测序生物技术,可在转录组范围内定量 m6A 水平,在基础和临床应用中扩展了 RNA 表观遗传学研究,呈上升趋势。RNA 甲基化数据分析中的一个基本问题是通过对照病例和对照来识别差异甲基化区域(DMR)。最近已经开发了多种用于 DMR 检测的统计方法,但缺乏对这些分析方法的全面评估。在这里,我们使用合成数据和真实数据彻底评估了 DMR 调用的所有 8 种现有方法。我们的模拟采用了伽马泊松模型和对数线性框架,并为基准测试容纳了各种样本大小和 DMR 比例。对于所有方法,在输入水平较低的区域中观察到低灵敏度,但通过增加样本量可以大大提高灵敏度。TRESS 和 exomePeak2 使用检测精度、FDR、I 型错误控制和运行时等指标表现最佳,尽管灵敏度较低。DRME 和 exomePeak 获得了较高的灵敏度,但代价是 FDR 和 I 型错误的膨胀。对三个真实数据集的分析表明,这些方法之间在鉴定的 DMR 长度和独特发现的区域上存在差异偏好。