• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在全科医疗中通过数据挖掘从电子健康记录中检索吸烟状况。

Data mining to retrieve smoking status from electronic health records in general practice.

作者信息

de Boer Annemarijn R, de Groot Mark C H, Groenhof T Katrien J, van Doorn Sander, Vaartjes Ilonca, Bots Michiel L, Haitjema Saskia

机构信息

Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Heidelberglaan 100, Utrecht 3584 CX, The Netherlands.

Dutch Heart Foundation, The Hague, The Netherlands.

出版信息

Eur Heart J Digit Health. 2022 May 20;3(3):437-444. doi: 10.1093/ehjdh/ztac031. eCollection 2022 Sep.

DOI:10.1093/ehjdh/ztac031
PMID:36712169
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9707867/
Abstract

AIMS

Optimize and assess the performance of an existing data mining algorithm for smoking status from hospital electronic health records (EHRs) in general practice EHRs.

METHODS AND RESULTS

We optimized an existing algorithm in a training set containing all clinical notes from 498 individuals (75 712 contact moments) from the Julius General Practitioners' Network (JGPN). Each moment was classified as either 'current smoker', 'former smoker', 'never smoker', or 'no information'. As a reference, we manually reviewed EHRs. Algorithm performance was assessed in an independent test set ( = 494, 78 129 moments) using precision, recall, and F1-score. Test set algorithm performance for 'current smoker' was precision 79.7%, recall 78.3%, and F1-score 0.79. For former smoker, it was precision 73.8%, recall 64.0%, and F1-score 0.69. For never smoker, it was precision 92.0%, recall 74.9%, and F1-score 0.83. On a patient level, performance for ever smoker (current and former smoker combined) was precision 87.9%, recall 94.7%, and F1-score 0.91. For never smoker, it was 98.0, 82.0, and 0.89%, respectively. We found a more narrative writing style in general practice than in hospital EHRs.

CONCLUSION

Data mining can successfully retrieve smoking status information from general practice clinical notes with a good performance for classifying ever and never smokers. Differences between general practice and hospital EHRs call for optimization of data mining algorithms when applied beyond a primary development setting.

摘要

目的

优化并评估一种现有的数据挖掘算法,该算法用于从全科医疗电子健康记录(EHR)中确定吸烟状态。

方法与结果

我们在一个训练集中优化了现有的算法,该训练集包含来自朱利叶斯全科医生网络(JGPN)的498名个体(75712个接触时刻)的所有临床记录。每个时刻被分类为“当前吸烟者”“既往吸烟者”“从不吸烟者”或“无信息”。作为参考,我们人工查阅了电子健康记录。使用精确率、召回率和F1分数在一个独立的测试集(n = 494,78129个时刻)中评估算法性能。测试集算法对“当前吸烟者”的性能为精确率79.7%,召回率78.3%,F1分数0.79。对于既往吸烟者,精确率为73.8%,召回率为64.0%,F1分数为0.69。对于从不吸烟者,精确率为92.0%,召回率为74.9%,F1分数为0.83。在患者层面,曾经吸烟者(当前吸烟者和既往吸烟者合并)的性能为精确率87.9%,召回率94.7%,F1分数0.91。对于从不吸烟者,分别为98.0%、82.0%和0.89%。我们发现全科医疗中的临床记录比医院电子健康记录的叙事风格更强。

结论

数据挖掘可以成功地从全科医疗临床记录中检索吸烟状态信息,在对曾经吸烟者和从不吸烟者进行分类方面具有良好的性能。全科医疗和医院电子健康记录之间的差异要求在超出主要开发环境应用数据挖掘算法时进行优化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949f/9707867/e55e0c786067/ztac031f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949f/9707867/3fc751689ef4/ztac031ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949f/9707867/0c588b31a9b5/ztac031f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949f/9707867/e55e0c786067/ztac031f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949f/9707867/3fc751689ef4/ztac031ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949f/9707867/0c588b31a9b5/ztac031f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949f/9707867/e55e0c786067/ztac031f2.jpg

相似文献

1
Data mining to retrieve smoking status from electronic health records in general practice.在全科医疗中通过数据挖掘从电子健康记录中检索吸烟状况。
Eur Heart J Digit Health. 2022 May 20;3(3):437-444. doi: 10.1093/ehjdh/ztac031. eCollection 2022 Sep.
2
Identification of patients' smoking status using an explainable AI approach: a Danish electronic health records case study.利用可解释 AI 方法识别患者的吸烟状况:丹麦电子健康记录案例研究。
BMC Med Res Methodol. 2024 May 17;24(1):114. doi: 10.1186/s12874-024-02231-4.
3
Leveraging Electronic Dental Record Data to Classify Patients Based on Their Smoking Intensity.利用电子牙科记录数据根据吸烟强度对患者进行分类。
Methods Inf Med. 2018 Nov;57(5-06):253-260. doi: 10.1055/s-0039-1681088. Epub 2019 Mar 15.
4
Association of passive and active smoking with self-rated health and life satisfaction in Iranian children and adolescents: the CASPIAN IV study.伊朗儿童和青少年中被动吸烟与主动吸烟与自评健康及生活满意度的关联:Caspian IV研究
BMJ Open. 2017 Feb 14;7(2):e012694. doi: 10.1136/bmjopen-2016-012694.
5
Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods.基于机器学习方法的中文电子健康记录临床命名实体识别
JMIR Med Inform. 2018 Dec 17;6(4):e50. doi: 10.2196/medinform.9965.
6
Transition towards a 'non-smoker' identity following smoking cessation: an interpretative phenomenological analysis.戒烟后向“非吸烟者”身份转变:解释现象学分析。
Br J Health Psychol. 2012 Feb;17(1):171-84. doi: 10.1111/j.2044-8287.2011.02031.x. Epub 2011 May 26.
7
Deep Phenotyping of Chinese Electronic Health Records by Recognizing Linguistic Patterns of Phenotypic Narratives With a Sequence Motif Discovery Tool: Algorithm Development and Validation.利用序列基序发现工具识别表型叙述的语言模式对中国电子健康记录进行深度表型分析:算法开发与验证
J Med Internet Res. 2022 Jun 3;24(6):e37213. doi: 10.2196/37213.
8
Evaluation of Natural Language Processing for the Identification of Crohn Disease-Related Variables in Spanish Electronic Health Records: A Validation Study for the PREMONITION-CD Project.西班牙语电子健康记录中用于识别克罗恩病相关变量的自然语言处理评估:PREMONITION-CD项目的验证研究
JMIR Med Inform. 2022 Feb 18;10(2):e30345. doi: 10.2196/30345.
9
Effects of active smoking on postoperative outcomes in hospitalised patients undergoing elective surgery: a retrospective analysis of an administrative claims database in Japan.主动吸烟对择期手术住院患者术后结局的影响:日本一项行政索赔数据库的回顾性分析
BMJ Open. 2019 Oct 1;9(10):e029913. doi: 10.1136/bmjopen-2019-029913.
10
LATTE: A knowledge-based method to normalize various expressions of laboratory test results in free text of Chinese electronic health records.LATTE:一种基于知识的方法,用于规范化中文电子健康记录自由文本中实验室检查结果的各种表达方式。
J Biomed Inform. 2020 Feb;102:103372. doi: 10.1016/j.jbi.2019.103372. Epub 2019 Dec 31.

引用本文的文献

1
Extracting patient lifestyle characteristics from Dutch clinical text with BERT models.使用 BERT 模型从荷兰临床文本中提取患者生活方式特征。
BMC Med Inform Decis Mak. 2024 Jun 3;24(1):151. doi: 10.1186/s12911-024-02557-5.
2
Identification of patients' smoking status using an explainable AI approach: a Danish electronic health records case study.利用可解释 AI 方法识别患者的吸烟状况:丹麦电子健康记录案例研究。
BMC Med Res Methodol. 2024 May 17;24(1):114. doi: 10.1186/s12874-024-02231-4.

本文引用的文献

1
Text Mining of Electronic Health Records Can Accurately Identify and Characterize Patients With Systemic Lupus Erythematosus.电子健康记录的文本挖掘能够准确识别和描述系统性红斑狼疮患者的特征。
ACR Open Rheumatol. 2021 Feb;3(2):65-71. doi: 10.1002/acr2.11211. Epub 2021 Jan 12.
2
Natural language processing and machine learning to enable automatic extraction and classification of patients' smoking status from electronic medical records.自然语言处理和机器学习可实现从电子病历中自动提取和分类患者的吸烟状况。
Ups J Med Sci. 2020 Nov;125(4):316-324. doi: 10.1080/03009734.2020.1792010. Epub 2020 Jul 22.
3
Data mining information from electronic health records produced high yield and accuracy for current smoking status.
从电子健康记录中挖掘信息对当前吸烟状况具有较高的产量和准确率。
J Clin Epidemiol. 2020 Feb;118:100-106. doi: 10.1016/j.jclinepi.2019.11.006. Epub 2019 Nov 12.
4
A computerised decision support system for cardiovascular risk management 'live' in the electronic health record environment: development, validation and implementation-the Utrecht Cardiovascular Cohort Initiative.一种用于心血管风险管理的计算机化决策支持系统在电子健康记录环境中的“实时”应用:开发、验证与实施——乌得勒支心血管队列计划
Neth Heart J. 2019 Sep;27(9):435-442. doi: 10.1007/s12471-019-01308-w.
5
Leveraging Electronic Dental Record Data to Classify Patients Based on Their Smoking Intensity.利用电子牙科记录数据根据吸烟强度对患者进行分类。
Methods Inf Med. 2018 Nov;57(5-06):253-260. doi: 10.1055/s-0039-1681088. Epub 2019 Mar 15.
6
A clinical text classification paradigm using weak supervision and deep representation.一种使用弱监督和深度表示的临床文本分类范式。
BMC Med Inform Decis Mak. 2019 Jan 7;19(1):1. doi: 10.1186/s12911-018-0723-6.
7
Routine primary care data for scientific research, quality of care programs and educational purposes: the Julius General Practitioners' Network (JGPN).用于科研、医疗质量项目及教育目的的常规初级保健数据:朱利叶斯全科医生网络(JGPN)。
BMC Health Serv Res. 2018 Sep 25;18(1):735. doi: 10.1186/s12913-018-3528-5.
8
Big data from electronic health records for early and late translational cardiovascular research: challenges and potential.电子健康记录中的大数据在心血管转化研究中的早期和晚期应用:挑战与潜力。
Eur Heart J. 2018 Apr 21;39(16):1481-1495. doi: 10.1093/eurheartj/ehx487.
9
Promises and pitfalls of electronic health record analysis.电子健康记录分析的承诺与陷阱。
Diabetologia. 2018 Jun;61(6):1241-1248. doi: 10.1007/s00125-017-4518-6. Epub 2017 Dec 15.
10
Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress.临床数据的再利用或二次使用:现状与未来潜在进展
Yearb Med Inform. 2017 Aug;26(1):38-52. doi: 10.15265/IY-2017-007. Epub 2017 Sep 11.