Liu Yuanhang, Wilson Desiree, Leach Robin J, Chen Yidong
Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, San Antonio, TX, USA.
Department of Cellular and Structure Biology, University of Texas Health Science Center at San Antonio, San Antonio, TX, USA.
BMC Genomics. 2016 Aug 18;17 Suppl 4(Suppl 4):432. doi: 10.1186/s12864-016-2794-z.
Since its initial discovery in 1975, DNA methylation has been intensively studied and shown to be involved in various biological processes, such as development, aging and tumor progression. Many experimental techniques have been developed to measure the level of DNA methylation. Methyl-CpG binding domain-based capture followed by high-throughput sequencing (MBDCap-seq) is a widely used method for characterizing DNA methylation patterns in a genome-wide manner. However, current methods for processing MBDCap-seq datasets does not take into account of the region-specific genomic characteristics that might have an impact on the measurements of the amount of methylated DNA (signal) and background fluctuation (noise). Thus, specific software needs to be developed for MBDCap-seq experiments.
A new differential methylation quantification algorithm for MBDCap-seq, MBDDiff, was implemented. To evaluate the performance of the MBDDiff algorithm, a set of simulated signal based on negative binomial and Poisson distribution with parameters estimated from real MBDCap-seq datasets accompanied with different background noises were generated, and then performed against a set of commonly used algorithms for MBDCap-seq data analysis in terms of area under the ROC curve (AUC), number of false discoveries and statistical power. In addition, we also demonstrated the effective of MBDDiff algorithm to a set of in-house prostate cancer samples, endometrial cancer samples published earlier, and a set of public-domain triple negative breast cancer samples to identify potential factors that contribute to cancer development and recurrence.
In this paper we developed an algorithm, MBDDiff, designed specifically for datasets derived from MBDCap-seq. MBDDiff contains three modules: quality assessment of datasets and quantification of DNA methylation; determination of differential methylation of promoter regions; and visualization functionalities. Simulation results suggest that MBDDiff performs better compared to MEDIPS and DESeq in terms of AUC and the number of false discoveries at different levels of background noise. MBDDiff outperforms MEDIPS with increased backgrounds noise, but comparable performance when noise level is lower. By applying MBDDiff to several MBDCap-seq datasets, we were able to identify potential targets that contribute to the corresponding biological processes. Taken together, MBDDiff provides user an accurate differential methylation analysis for data generated by MBDCap-seq, especially under noisy conditions.
自1975年首次发现DNA甲基化以来,其已得到深入研究,并被证明参与了各种生物学过程,如发育、衰老和肿瘤进展。已经开发了许多实验技术来测量DNA甲基化水平。基于甲基化CpG结合结构域的捕获,随后进行高通量测序(MBDCap-seq)是一种广泛用于全基因组表征DNA甲基化模式的方法。然而,目前处理MBDCap-seq数据集的方法没有考虑到可能影响甲基化DNA量(信号)测量和背景波动(噪声)的区域特异性基因组特征。因此,需要为MBDCap-seq实验开发特定的软件。
实现了一种用于MBDCap-seq的新的差异甲基化定量算法MBDDiff。为了评估MBDDiff算法的性能,生成了一组基于负二项式和泊松分布的模拟信号,其参数根据真实的MBDCap-seq数据集估计,并伴有不同的背景噪声,然后根据ROC曲线下面积(AUC)、错误发现数量和统计功效,与一组常用的MBDCap-seq数据分析算法进行比较。此外,我们还证明了MBDDiff算法对一组内部前列腺癌样本、早期发表的子宫内膜癌样本以及一组公共领域的三阴性乳腺癌样本的有效性,以识别促成癌症发展和复发的潜在因素。
在本文中,我们开发了一种专门为源自MBDCap-seq的数据集设计的算法MBDDiff。MBDDiff包含三个模块:数据集质量评估和DNA甲基化定量;启动子区域差异甲基化的确定;以及可视化功能。模拟结果表明,在不同背景噪声水平下,MBDDiff在AUC和错误发现数量方面比MEDIPS和DESeq表现更好。在背景噪声增加时,MBDDiff优于MEDIPS,但在噪声水平较低时性能相当。通过将MBDDiff应用于几个MBDCap-seq数据集,我们能够识别促成相应生物学过程的潜在靶点。综上所述,MBDDiff为MBDCap-seq生成的数据提供了准确的差异甲基化分析,尤其是在有噪声的条件下。