Dartmouth College, Hanover, NH, USA.
New Hampshire Colonoscopy Registery, Lebanon, NH, USA.
BMC Med Inform Decis Mak. 2019 Jul 25;19(1):143. doi: 10.1186/s12911-019-0864-2.
Approximately 20% of deaths in the US each year are attributable to smoking, yet current practices in the recording of this health risk in electronic health records (EHRs) have not led to discernable changes in health outcomes. Several groups have developed algorithms for extracting smoking behaviors from clinical notes, but none of these approaches were assessed with external data to report on anticipated clinical performance.
Previously, we developed an informatics pipeline that extracts smoking status, pack year history, and cessation date from clinical notes. Here we report on the clinical implementation performance of our pipeline using 1,504 clinical notes matched to an external questionnaire.
We found that 73% of available notes contained no smoking behavior information. The weighted Cohen's kappa between the external questionnaire and EHR smoking status was 0.62 (95% CI 0.56-0.69) for the clinical notes we were able to extract information from. The correlation between pack years reported by our pipeline and the external questionnaire was 0.39 on the 81 notes for which this information was present in both. We also assessed for lung cancer screening eligibility using notes from individuals identified as never smokers or smokers with pack year history extracted by our pipeline (n = 196). We found a positive predictive value of 85.4%, a negative predictive value of 83.8%, sensitivity of 63.1%, and specificity of 94.7%.
We have demonstrated that our pipeline can extract smoking behaviors from unannotated EHR notes when the information is present. This information is reliable enough to identify patients most likely to be eligible for smoking related services. Ensuring capture of smoking information during clinical encounters should continue to be a high priority.
美国每年约有 20%的死亡可归因于吸烟,但电子健康记录 (EHR) 中对这一健康风险的记录目前并没有导致健康结果的明显变化。有几个小组已经开发了从临床记录中提取吸烟行为的算法,但这些方法都没有使用外部数据来评估预期的临床性能。
此前,我们开发了一个信息学管道,从临床记录中提取吸烟状况、吸烟年数和戒烟日期。在这里,我们报告了使用 1504 份与外部问卷匹配的临床记录来评估我们管道的临床实施性能。
我们发现 73%的可用记录中没有包含吸烟行为信息。从我们能够提取信息的临床记录中,外部问卷和 EHR 吸烟状况之间的加权 Cohen's kappa 为 0.62(95%CI 0.56-0.69)。我们的管道报告的吸烟年数与外部问卷之间的相关性在存在此信息的 81 份记录中为 0.39。我们还使用从未吸烟者或通过我们的管道提取吸烟年数的吸烟者的记录来评估肺癌筛查资格(n=196)。我们发现,对于从未吸烟者或通过我们的管道提取吸烟年数的吸烟者,其阳性预测值为 85.4%,阴性预测值为 83.8%,敏感性为 63.1%,特异性为 94.7%。
我们已经证明,当信息存在时,我们的管道可以从未注释的 EHR 记录中提取吸烟行为。这些信息足以确定最有可能有资格获得与吸烟相关服务的患者。在临床就诊期间确保吸烟信息的捕获应继续作为一个高度优先事项。