Suppr超能文献

使用自然语言处理和基于规则的算法从临床记录中获取烟草使用状况。

Tobacco use status from clinical notes using Natural Language Processing and rule based algorithm.

作者信息

Hegde Harshad, Shimpi Neel, Glurich Ingrid, Acharya Amit

出版信息

Technol Health Care. 2018;26(3):445-456. doi: 10.3233/THC-171127.

Abstract

BACKGROUND

This cross-sectional retrospective study utilized Natural Language Processing (NLP) to extract tobacco-use associated variables from clinical notes documented in the Electronic Health Record (EHR).

OBJECITVE

To develop a rule-based algorithm for determining the present status of the patient's tobacco-use.

METHODS

Clinical notes (n= 5,371 documents) from 363 patients were mined and classified by NLP software into four classes namely: "Current Smoker", "Past Smoker", "Nonsmoker" and "Unknown". Two coders manually classified these documents into above mentioned classes (document-level gold standard classification (DLGSC)). A tobacco-use status was derived per patient (patient-level gold standard classification (PLGSC)), based on individual documents' status by the same two coders. The DLGSC and PLGSC were compared to the results derived from NLP and rule-based algorithm, respectively.

RESULTS

The initial Cohen's kappa (n= 1,000 documents) was 0.9448 (95% CI = 0.9281-0.9615), indicating a strong agreement between the two raters. Subsequently, for 371 documents the Cohen's kappa was 0.9889 (95% CI = 0.979-1.000). The F-measures for the document-level classification for the four classes were 0.700, 0.753, 0.839 and 0.988 while the patient-level classifications were 0.580, 0.771, 0.730 and 0.933 respectively.

CONCLUSIONS

NLP and the rule-based algorithm exhibited utility for deriving the present tobacco-use status of patients. Current strategies are targeting further improvement in precision to enhance translational value of the tool.

摘要

背景

这项横断面回顾性研究利用自然语言处理(NLP)从电子健康记录(EHR)中记录的临床笔记中提取与烟草使用相关的变量。

目的

开发一种基于规则的算法来确定患者的烟草使用现状。

方法

对363例患者的临床笔记(共5371份文档)进行挖掘,并通过NLP软件将其分类为四类,即:“当前吸烟者”、“既往吸烟者”、“非吸烟者”和“未知”。两名编码员将这些文档手动分类到上述类别中(文档级金标准分类(DLGSC))。基于这两名编码员对各个文档状态的判断,得出每位患者的烟草使用状态(患者级金标准分类(PLGSC))。分别将DLGSC和PLGSC与NLP和基于规则的算法得出的结果进行比较。

结果

最初的科恩kappa系数(针对1000份文档)为0.9448(95%置信区间 = 0.9281 - 0.9615),表明两位评估者之间有很强的一致性。随后,对于371份文档,科恩kappa系数为0.9889(95%置信区间 = 0.979 - 1.000)。四类文档级分类的F值分别为0.700、0.753、0.839和0.988,而患者级分类的F值分别为0.580、0.771、0.730和0.933。

结论

NLP和基于规则的算法在推导患者当前的烟草使用状态方面显示出实用性。当前策略旨在进一步提高精度,以增强该工具的转化价值。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验