Zheng Yinan, Joyce Brian T, Liu Lei, Zhang Zhou, Kibbe Warren A, Zhang Wei, Hou Lifang
Center for Population Epigenetics, Robert H. Lurie Comprehensive Cancer Center and Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA.
Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA.
Nucleic Acids Res. 2017 Sep 6;45(15):8697-8711. doi: 10.1093/nar/gkx587.
DNA methylation in repetitive elements (RE) suppresses their mobility and maintains genomic stability, and decreases in it are frequently observed in tumor and/or surrogate tissues. Averaging methylation across RE in genome is widely used to quantify global methylation. However, methylation may vary in specific RE and play diverse roles in disease development, thus averaging methylation across RE may lose significant biological information. The ambiguous mapping of short reads by and high cost of current bisulfite sequencing platforms make them impractical for quantifying locus-specific RE methylation. Although microarray-based approaches (particularly Illumina's Infinium methylation arrays) provide cost-effective and robust genome-wide methylation quantification, the number of interrogated CpGs in RE remains limited. We report a random forest-based algorithm (and corresponding R package, REMP) that can accurately predict genome-wide locus-specific RE methylation based on Infinium array profiling data. We validated its prediction performance using alternative sequencing and microarray data. Testing its clinical utility with The Cancer Genome Atlas data demonstrated that our algorithm offers more comprehensively extended locus-specific RE methylation information that can be readily applied to large human studies in a cost-effective manner. Our work has the potential to improve our understanding of the role of global methylation in human diseases, especially cancer.
重复元件(RE)中的DNA甲基化可抑制其移动性并维持基因组稳定性,在肿瘤和/或替代组织中经常观察到其水平降低。对基因组中RE的甲基化进行平均广泛用于量化整体甲基化。然而,特定RE中的甲基化可能会有所不同,并在疾病发展中发挥不同作用,因此对RE的甲基化进行平均可能会丢失重要的生物学信息。当前亚硫酸氢盐测序平台对短读段的映射不明确且成本高昂,使其在量化位点特异性RE甲基化方面不切实际。尽管基于微阵列的方法(特别是Illumina的Infinium甲基化阵列)提供了具有成本效益且强大的全基因组甲基化定量,但RE中检测到的CpG数量仍然有限。我们报告了一种基于随机森林的算法(以及相应的R包,REMP),该算法可以根据Infinium阵列分析数据准确预测全基因组位点特异性RE甲基化。我们使用替代测序和微阵列数据验证了其预测性能。用癌症基因组图谱数据测试其临床实用性表明,我们的算法提供了更全面扩展的位点特异性RE甲基化信息,可以以具有成本效益的方式轻松应用于大型人类研究。我们的工作有可能增进我们对整体甲基化在人类疾病,尤其是癌症中作用的理解。