Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
Methods Mol Biol. 2022;2432:137-151. doi: 10.1007/978-1-0716-1994-0_11.
In this chapter, we will provide a review on imputation in the context of DNA methylation, specifically focusing on a penalized functional regression (PFR) method we have previously developed. We will start with a brief review of DNA methylation, genomic and epigenomic contexts where imputation has proven beneficial in practice, and statistical or computational methods proposed for DNA methylation in the recent literature (Subheading 1). The rest of the chapter (Subheadings 2-4) will provide a detailed review of our PFR method proposed for across-platform imputation, which incorporates nonlocal information using a penalized functional regression framework. Subheading 2 introduces commonly employed technologies for DNA methylation measurement and describes the real dataset we have used in the development of our method: the acute myeloid leukemia (AML) dataset from The Cancer Genome Atlas (TCGA) project. Subheading 3 comprehensively reviews our method, encompassing data harmonization prior to model building, the actual building of penalized functional regression model, post-imputation quality filter, and imputation quality assessment. Subheading 4 shows the performance of our method in both simulation and the TCGA AML dataset, demonstrating that our penalized functional regression model is a valuable across-platform imputation tool for DNA methylation data, particularly because of its ability to boost statistical power for subsequent epigenome-wide association study. Finally, Subheading 5 provides future perspectives on imputation for DNA methylation data.
在本章中,我们将回顾 DNA 甲基化背景下的插补,特别关注我们之前开发的惩罚函数回归(PFR)方法。我们将首先简要回顾 DNA 甲基化、基因组和表观基因组背景下插补已被证明在实践中有益的情况,以及最近文献中提出的用于 DNA 甲基化的统计或计算方法(子标题 1)。本章的其余部分(子标题 2-4)将详细回顾我们提出的用于跨平台插补的 PFR 方法,该方法使用惩罚函数回归框架纳入非局部信息。子标题 2 介绍了常用于 DNA 甲基化测量的技术,并描述了我们在方法开发中使用的真实数据集:来自癌症基因组图谱(TCGA)项目的急性髓系白血病(AML)数据集。子标题 3 全面回顾了我们的方法,包括模型构建前的数据协调、惩罚函数回归模型的实际构建、插补后质量筛选以及插补质量评估。子标题 4 展示了我们的方法在模拟和 TCGA AML 数据集上的性能,表明我们的惩罚函数回归模型是一种有价值的跨平台 DNA 甲基化数据插补工具,特别是因为它能够提高随后的全基因组关联研究的统计功效。最后,子标题 5 提供了 DNA 甲基化数据插补的未来展望。