Institute for Molecular Medicine Finland, University of Helsinki, 00290 Helsinki, Uusimaa, Finland.
Department of Public Health, University of Helsinki, 00290 Helsinki, Uusimaa, Finland.
Epigenomics. 2019 Oct;11(13):1469-1486. doi: 10.2217/epi-2019-0206. Epub 2019 Aug 30.
Smoking strongly influences DNA methylation, with current and never smokers exhibiting different methylation profiles. To advance the practical applicability of the smoking-associated methylation signals, we used machine learning methodology to train a classifier for smoking status prediction. We show the prediction performance of our classifier on three independent whole-blood datasets demonstrating its robustness and global applicability. Furthermore, we examine the reasons for biologically meaningful misclassifications through comprehensive phenotypic evaluation. The major contribution of our classifier is its global applicability without a need for users to determine a threshold value for each dataset to predict the smoking status. We provide an R package, EpiSmokEr (Epigenetic Smoking status Estimator), facilitating the use of our classifier to predict smoking status in future studies.
吸烟强烈影响 DNA 甲基化,当前吸烟者和从不吸烟者表现出不同的甲基化谱。为了提高与吸烟相关的甲基化信号的实际应用,我们使用机器学习方法训练了一个用于预测吸烟状态的分类器。我们在三个独立的全血数据集上展示了我们的分类器的预测性能,证明了其稳健性和全球适用性。此外,我们通过全面的表型评估检查了生物学上有意义的误分类的原因。我们的分类器的主要贡献是其全球适用性,无需用户为每个数据集确定一个阈值来预测吸烟状态。我们提供了一个 R 包 EpiSmokEr(Epigenetic Smoking status Estimator),方便在未来的研究中使用我们的分类器来预测吸烟状态。