Teng Mingxiang, Wang Yadong, Kim Seongho, Li Lang, Shen Changyu, Wang Guohua, Liu Yunlong, Huang Tim H M, Nephew Kenneth P, Balch Curt
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
Comp Funct Genomics. 2012;2012:376706. doi: 10.1155/2012/376706. Epub 2012 Aug 22.
A number of empirical Bayes models (each with different statistical distribution assumptions) have now been developed to analyze differential DNA methylation using high-density oligonucleotide tiling arrays. However, it remains unclear which model performs best. For example, for analysis of differentially methylated regions for conservative and functional sequence characteristics (e.g., enrichment of transcription factor-binding sites (TFBSs)), the sensitivity of such analyses, using various empirical Bayes models, remains unclear. In this paper, five empirical Bayes models were constructed, based on either a gamma distribution or a log-normal distribution, for the identification of differential methylated loci and their cell division-(1, 3, and 5) and drug-treatment-(cisplatin) dependent methylation patterns. While differential methylation patterns generated by log-normal models were enriched with numerous TFBSs, we observed almost no TFBS-enriched sequences using gamma assumption models. Statistical and biological results suggest log-normal, rather than gamma, empirical Bayes model distribution to be a highly accurate and precise method for differential methylation microarray analysis. In addition, we presented one of the log-normal models for differential methylation analysis and tested its reproducibility by simulation study. We believe this research to be the first extensive comparison of statistical modeling for the analysis of differential DNA methylation, an important biological phenomenon that precisely regulates gene transcription.
现在已经开发出了许多经验贝叶斯模型(每个模型都有不同的统计分布假设),用于使用高密度寡核苷酸平铺阵列分析差异DNA甲基化。然而,哪种模型表现最佳仍不清楚。例如,对于保守和功能序列特征(如转录因子结合位点(TFBSs)的富集)的差异甲基化区域分析,使用各种经验贝叶斯模型进行此类分析的灵敏度仍不清楚。在本文中,基于伽马分布或对数正态分布构建了五个经验贝叶斯模型,用于识别差异甲基化位点及其细胞分裂(1、3和5)和药物处理(顺铂)依赖性甲基化模式。虽然对数正态模型产生的差异甲基化模式富含大量TFBSs,但我们使用伽马假设模型几乎未观察到富含TFBSs的序列。统计和生物学结果表明,对数正态而非伽马经验贝叶斯模型分布是用于差异甲基化微阵列分析的一种高度准确和精确的方法。此外,我们提出了一种用于差异甲基化分析的对数正态模型,并通过模拟研究测试了其可重复性。我们认为这项研究是对用于分析差异DNA甲基化(一种精确调节基因转录的重要生物学现象)的统计建模的首次广泛比较。