Department of Epidemiology, Erasmus MC University Medical Center Rotterdam, Dr. Molewaterplein 40, 3015 GD, Rotterdam, The Netherlands.
Department of Genetic Identification, Erasmus MC University Medical Center Rotterdam, Dr. Molewaterplein 40, 3015 GD, Rotterdam, The Netherlands.
Eur J Epidemiol. 2019 Nov;34(11):1055-1074. doi: 10.1007/s10654-019-00555-w. Epub 2019 Sep 7.
Inferring a person's smoking habit and history from blood is relevant for complementing or replacing self-reports in epidemiological and public health research, and for forensic applications. However, a finite DNA methylation marker set and a validated statistical model based on a large dataset are not yet available. Employing 14 epigenome-wide association studies for marker discovery, and using data from six population-based cohorts (N = 3764) for model building, we identified 13 CpGs most suitable for inferring smoking versus non-smoking status from blood with a cumulative Area Under the Curve (AUC) of 0.901. Internal fivefold cross-validation yielded an average AUC of 0.897 ± 0.137, while external model validation in an independent population-based cohort (N = 1608) achieved an AUC of 0.911. These 13 CpGs also provided accurate inference of current (average AUC 0.925 ± 0.021, AUC0.914), former (0.766 ± 0.023, 0.699) and never smoking (0.830 ± 0.019, 0.781) status, allowed inferring pack-years in current smokers (10 pack-years 0.800 ± 0.068, 0.796; 15 pack-years 0.767 ± 0.102, 0.752) and inferring smoking cessation time in former smokers (5 years 0.774 ± 0.024, 0.760; 10 years 0.766 ± 0.033, 0.764; 15 years 0.767 ± 0.020, 0.754). Model application to children revealed highly accurate inference of the true non-smoking status (6 years of age: accuracy 0.994, N = 355; 10 years: 0.994, N = 309), suggesting prenatal and passive smoking exposure having no impact on model applications in adults. The finite set of DNA methylation markers allow accurate inference of smoking habit, with comparable accuracy as plasma cotinine use, and smoking history from blood, which we envision becoming useful in epidemiology and public health research, and in medical and forensic applications.
从血液中推断一个人的吸烟习惯和历史与补充或替代流行病学和公共卫生研究中的自我报告以及法医应用有关。然而,目前还没有基于大型数据集的有限 DNA 甲基化标记集和经过验证的统计模型。我们通过 14 项全基因组关联研究来发现标记物,并使用来自六个基于人群的队列(N=3764)的数据来构建模型,确定了 13 个最适合从血液中推断吸烟与非吸烟状态的 CpG,累积曲线下面积(AUC)为 0.901。内部五重交叉验证得到的平均 AUC 为 0.897±0.137,而在独立的基于人群的队列(N=1608)中进行的外部模型验证的 AUC 为 0.911。这 13 个 CpG 还可以准确推断当前(平均 AUC 0.925±0.021,AUC0.914)、以前(0.766±0.023,0.699)和从不吸烟(0.830±0.019,0.781)状态,允许推断当前吸烟者的吸烟年数(10 包年 0.800±0.068,0.796;15 包年 0.767±0.102,0.752)和推断以前吸烟者的戒烟时间(5 年 0.774±0.024,0.760;10 年 0.766±0.033,0.764;15 年 0.767±0.020,0.754)。将模型应用于儿童,结果显示对真实非吸烟状态的推断非常准确(6 岁:准确性 0.994,N=355;10 岁:0.994,N=309),这表明产前和被动吸烟暴露对成年人的模型应用没有影响。这组有限的 DNA 甲基化标记物可以准确推断吸烟习惯,与使用血浆可替宁的准确性相当,还可以从血液中推断吸烟史,我们预计这些标记物将在流行病学和公共卫生研究以及医学和法医应用中变得有用。