National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA.
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK.
BMC Genomics. 2018 May 23;19(1):390. doi: 10.1186/s12864-018-4766-y.
Bisulfite sequencing is widely employed to study the role of DNA methylation in disease; however, the data suffer from biases due to coverage depth variability. Imputation of methylation values at low-coverage sites may mitigate these biases while also identifying important genomic features associated with predictive power.
Here we describe BoostMe, a method for imputing low-quality DNA methylation estimates within whole-genome bisulfite sequencing (WGBS) data. BoostMe uses a gradient boosting algorithm, XGBoost, and leverages information from multiple samples for prediction. We find that BoostMe outperforms existing algorithms in speed and accuracy when applied to WGBS of human tissues. Furthermore, we show that imputation improves concordance between WGBS and the MethylationEPIC array at low WGBS depth, suggesting improved WGBS accuracy after imputation.
Our findings support the use of BoostMe as a preprocessing step for WGBS analysis.
亚硫酸氢盐测序被广泛用于研究 DNA 甲基化在疾病中的作用;然而,由于覆盖深度的变化,数据会存在偏差。在低覆盖区域对甲基化值进行插补可以减轻这些偏差,同时还可以识别与预测能力相关的重要基因组特征。
这里我们描述了 BoostMe,一种用于在全基因组亚硫酸氢盐测序(WGBS)数据中插补低质量 DNA 甲基化估计值的方法。BoostMe 使用梯度提升算法 XGBoost,并利用来自多个样本的信息进行预测。我们发现,在应用于人体组织的 WGBS 时,BoostMe 在速度和准确性方面优于现有的算法。此外,我们表明,在 WGBS 深度较低时,插补可以提高 WGBS 与 MethylationEPIC 阵列之间的一致性,这表明插补后 WGBS 的准确性有所提高。
我们的研究结果支持将 BoostMe 用作 WGBS 分析的预处理步骤。