Veterans Affairs Connecticut Healthcare System, West Haven, CT, USA.
Yale School of Medicine, New Haven, CT, USA.
Addict Biol. 2019 Sep;24(5):1056-1065. doi: 10.1111/adb.12670. Epub 2018 Oct 4.
A validated, scalable approach to characterizing (phenotyping) smoking status is needed to facilitate genetic discovery. Using established DNA methylation sites from blood samples as a criterion standard for smoking behavior, we compare three candidate electronic medical record (EMR) smoking metrics based on longitudinal EMR text notes. With data from the Veterans Aging Cohort Study (VACS), we employed a validated algorithm to translate each smoking-related text note into current, past or never categories. We compared three alternative summary characterizations of smoking: most recent, modal and trajectories using descriptive statistics and Spearman's correlation coefficients. Logistic regression and area under the curve analyses were used to compare the associations of these phenotypes with the DNA methylation sites, cg05575921 and cg03636183, which are known to have strong associations with current smoking. DNA methylation data were available from the VACS Biomarker Cohort (VACS-BC), a sub-study of VACS. We also considered whether the associations differed by the certainty of trajectory group assignment (<0.80/≥0.80). Among 140 152 VACS participants, EMR summary smoking phenotypes varied in frequency by the metric chosen: current from 33 to 53 percent; past from 16 to 24 percent and never from 24 to 33 percent. The association between the EMR smoking pairs was highest for modal and trajectories (rho = 0.89). Among 728 individuals in the VACS-BC, both DNA methylation sites were associated with all three EMR summary metrics (p < 0.001), but the strongest association with both methylation sites was observed for trajectories (p < 0.001). Longitudinal EMR smoking data support using a summary phenotype, the validity of which is enhanced when data are integrated into statistical trajectories.
需要一种经过验证的、可扩展的方法来对吸烟状态进行特征描述(表型分析),以促进遗传发现。我们使用血液样本中的已建立的 DNA 甲基化位点作为吸烟行为的标准,比较了三种基于纵向电子病历(EMR)文本注释的候选 EMR 吸烟指标。利用退伍军人老龄化队列研究(VACS)的数据,我们使用经过验证的算法将每个与吸烟相关的文本注释转换为当前、过去或从不类别。我们比较了三种替代的吸烟摘要特征:最近、模态和轨迹,使用描述性统计和斯皮尔曼相关系数。逻辑回归和曲线下面积分析用于比较这些表型与 DNA 甲基化位点 cg05575921 和 cg03636183 的关联,这两个位点已知与当前吸烟有很强的关联。VACS 生物标志物队列(VACS-BC)是 VACS 的一个子研究,从 VACS 中获得了 DNA 甲基化数据。我们还考虑了关联是否因轨迹组分配的确定性而异(<0.80/≥0.80)。在 140152 名 VACS 参与者中,根据所选指标,EMR 摘要吸烟表型的频率不同:当前为 33%至 53%;过去为 16%至 24%;从不为 24%至 33%。EMR 吸烟对模态和轨迹的关联最高(rho=0.89)。在 VACS-BC 中的 728 名个体中,两个 DNA 甲基化位点都与所有三种 EMR 摘要指标相关(p<0.001),但与两个甲基化位点的最强关联是轨迹(p<0.001)。纵向 EMR 吸烟数据支持使用摘要表型,当数据整合到统计轨迹中时,其有效性会提高。