Department of Biostatistics and Epidemiology, Georgia Health Sciences University, Augusta, GA 30912-4900, USA.
Genet Epidemiol. 2013 May;37(4):377-82. doi: 10.1002/gepi.21726. Epub 2013 Apr 1.
Epigenetic changes, especially DNA methylation at CpG loci have important implications in cancer and other complex diseases. With the development of next-generation sequencing (NGS), it is feasible to generate data to interrogate the difference in methylation status for genome-wide loci using case-control design. However, a proper and efficient statistical test is lacking. There are several challenges. First, unlike methylation experiments using microarrays, where there is one measure of methylation for one individual at a particular CpG site, here we have the counts of methylation allele and unmethylation allele for each individual. Second, due to the nature of sample preparation, the measured methylation reflects the methylation status of a mixture of cells involved in sample preparation. Therefore, the underlying distribution of the measured methylation level is unknown, and a robust test is more desirable than parametric approach. Third, currently NGS measures methylation at over 2 million CpG sites. Any statistical tests have to be computationally efficient in order to be applied to the NGS data. Taking these challenges into account, we propose a test for differential methylation based on clustered data analysis by modeling the methylation counts. We performed simulations to show that it is robust under several distributions for the measured methylation levels. It has good power and is computationally efficient. Finally, we apply the test to our NGS data on chronic lymphocytic leukemia. The results indicate that it is a promising and practical test.
表观遗传变化,尤其是 CpG 位点的 DNA 甲基化,在癌症和其他复杂疾病中具有重要意义。随着下一代测序(NGS)的发展,使用病例对照设计生成用于全基因组位点甲基化状态差异的询问数据是可行的。然而,缺乏适当和有效的统计检验方法。这存在几个挑战。首先,与使用微阵列的甲基化实验不同,在微阵列中,一个个体在特定的 CpG 位点只有一个甲基化测量值,而在这里,我们有每个个体的甲基化等位基因和非甲基化等位基因的计数。其次,由于样本制备的性质,测量的甲基化反映了参与样本制备的混合细胞的甲基化状态。因此,测量的甲基化水平的基础分布是未知的,更需要稳健的检验而不是参数方法。第三,目前 NGS 测量了超过 200 万个 CpG 位点的甲基化。任何统计检验都必须在计算上有效,以便应用于 NGS 数据。考虑到这些挑战,我们提出了一种基于聚类数据分析的差异甲基化检验方法,通过对甲基化计数进行建模。我们进行了模拟,表明在几种测量甲基化水平的分布下,它是稳健的。它具有良好的功效和计算效率。最后,我们将该检验应用于我们的慢性淋巴细胞白血病 NGS 数据。结果表明,这是一种有前途和实用的检验方法。