Fuady Angga M, Lent Samantha, Sarnowski Chloé, Tintle Nathan L
Medical Statistics, Department of Biomedical Data Sciences, Leiden University Medical Center, Einthovenweg 20, 2333, Leiden, ZC, Netherlands.
Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Avenue, Boston, MA, 02118, USA.
BMC Genet. 2018 Sep 17;19(Suppl 1):72. doi: 10.1186/s12863-018-0647-2.
The rise in popularity and accessibility of DNA methylation data to evaluate epigenetic associations with disease has led to numerous methodological questions. As part of GAW20, our working group of 8 research groups focused on gene searching methods.
Although the methods were varied, we identified 3 main themes within our group. First, many groups tackled the question of how best to use pedigree information in downstream analyses, finding that (a) the use of kinship matrices is common practice, (b) ascertainment corrections may be necessary, and (c) pedigree information may be useful for identifying parent-of-origin effects. Second, many groups also considered multimarker versus single-marker tests. Multimarker tests had modestly improved power versus single-marker methods on simulated data, and on real data identified additional associations that were not identified with single-marker methods, including identification of a gene with a strong biological interpretation. Finally, some of the groups explored methods to combine single-nucleotide polymorphism (SNP) and DNA methylation into a single association analysis.
A causal inference method showed promise at discovering new mechanisms of SNP activity; gene-based methods of summarizing SNP and DNA methylation data also showed promise. Even though numerous questions still remain in the analysis of DNA methylation data, our discussions at GAW20 suggest some emerging best practices.
用于评估与疾病的表观遗传关联的DNA甲基化数据的普及程度和可获取性不断提高,引发了众多方法学问题。作为GAW20的一部分,我们由8个研究小组组成的工作组专注于基因搜索方法。
尽管方法各不相同,但我们在小组内确定了3个主要主题。首先,许多小组探讨了在下游分析中如何最佳利用系谱信息的问题,发现(a)使用亲缘关系矩阵是常见做法,(b)可能需要进行确定校正,(c)系谱信息可能有助于识别亲本来源效应。其次,许多小组还考虑了多标记与单标记测试。在模拟数据上,多标记测试相对于单标记方法的功效略有提高,并且在真实数据上发现了单标记方法未识别的其他关联,包括识别出一个具有强烈生物学解释的基因。最后,一些小组探索了将单核苷酸多态性(SNP)和DNA甲基化结合到单一关联分析中的方法。
一种因果推断方法在发现SNP活性的新机制方面显示出前景;基于基因总结SNP和DNA甲基化数据的方法也显示出前景。尽管在DNA甲基化数据分析中仍然存在许多问题,但我们在GAW20上的讨论提出了一些新出现的最佳实践。