改进初级保健 EMR 数据库中吸烟记录质量的方法：探索多种插补和模式匹配算法。

Methods to improve the quality of smoking records in a primary care EMR database: exploring multiple imputation and pattern-matching algorithms.

机构信息

Department of Family Medicine, University of Calgary, G012 Health Sciences Centre, 3330 Hospital Drive NW, Calgary, Alberta, T2N 4N1, Canada.

Department of Community Health Sciences, University of Calgary, 3280 Hospital Drive NW, Calgary, Alberta, T2N 4Z6, Canada.

出版信息

BMC Med Inform Decis Mak. 2020 Mar 14;20(1):56. doi: 10.1186/s12911-020-1068-5.

DOI:10.1186/s12911-020-1068-5

PMID:32171301

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7071570/

Abstract

BACKGROUND

Primary care electronic medical record (EMR) data are emerging as a useful source for secondary uses, such as disease surveillance, health outcomes research, and practice improvement. These data capture clinical details about patients' health status, as well as behavioural risk factors, such as smoking. While the importance of documenting smoking status in a healthcare setting is recognized, the quality of smoking data captured in EMRs is variable. This study was designed to test methods aimed at improving the quality of patient smoking information in a primary care EMR database.

METHODS

EMR data from community primary care settings extracted by two regional practice-based research networks in Alberta, Canada were used. Patients with at least one encounter in the previous 2 years (2016-2018) and having hypertension according to a validated definition were included (n = 48,377). Multiple imputation was tested under two different assumptions for missing data (smoking status is missing at random and missing not-at-random). A third method tested a novel pattern matching algorithm developed to augment smoking information in the primary care EMR database. External validity was examined by comparing the proportions of smoking categories generated in each method with a general population survey.

RESULTS

Among those with hypertension, 40.8% (n = 19,743) had either no smoking information recorded or it was not interpretable and considered missing. Those with missing smoking data differed statistically by demographics, clinical features, and type of EMR system used in the clinic. Both multiple imputation methods produced fully complete smoking status information, with the proportion of current smokers estimated at 25.3% (data missing at random) and 12.5% (data missing not-at-random). The pattern-matching algorithm classified 18.2% of patients as current smokers, similar to the population-based survey (18.9%), but still resulted in missing smoking information for 23.6% of patients. The algorithm was estimated to be 93.8% accurate overall, but varied by smoking status category.

CONCLUSION

Multiple imputation and algorithmic pattern-matching can be used to improve EMR data post-extraction but the recommended method depends on the purpose of secondary use (e.g. practice improvement or epidemiological analyses).

摘要

背景

初级保健电子病历（EMR）数据作为二次利用的有用来源正在出现，例如疾病监测、健康结果研究和实践改进。这些数据捕捉了患者健康状况的临床细节，以及行为风险因素，如吸烟。虽然在医疗保健环境中记录吸烟状况的重要性是公认的，但 EMR 中捕获的吸烟数据的质量是可变的。本研究旨在测试旨在提高初级保健 EMR 数据库中患者吸烟信息质量的方法。

方法

使用加拿大阿尔伯塔省两个区域基于实践的研究网络提取的社区初级保健环境中的 EMR 数据。纳入至少在前 2 年（2016-2018 年）有一次就诊且根据验证的定义患有高血压的患者（n=48377）。对于缺失数据（吸烟状况随机缺失和非随机缺失），测试了两种不同的多重插补假设。第三种方法测试了一种新的模式匹配算法，用于扩充初级保健 EMR 数据库中的吸烟信息。通过将每种方法生成的吸烟类别比例与一般人群调查进行比较，来检验外部有效性。

结果

在患有高血压的患者中，40.8%（n=19743）要么没有记录吸烟信息，要么无法解释，被认为是缺失的。那些有缺失吸烟数据的患者在人口统计学、临床特征和诊所使用的 EMR 系统类型方面存在统计学差异。两种多重插补方法都产生了完整的吸烟状况信息，当前吸烟者的比例估计为 25.3%（随机缺失）和 12.5%（非随机缺失）。模式匹配算法将 18.2%的患者归类为当前吸烟者，与基于人群的调查相似（18.9%），但仍导致 23.6%的患者有缺失吸烟信息。该算法总体上估计准确率为 93.8%，但因吸烟状况类别而异。

结论

多重插补和算法模式匹配可用于提取后改进 EMR 数据，但推荐的方法取决于二次使用的目的（例如实践改进或流行病学分析）。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/355a/7071570/c3481595446a/12911_2020_1068_Fig1_HTML.jpg

相似文献

Methods to improve the quality of smoking records in a primary care EMR database: exploring multiple imputation and pattern-matching algorithms.改进初级保健 EMR 数据库中吸烟记录质量的方法：探索多种插补和模式匹配算法。

BMC Med Inform Decis Mak. 2020 Mar 14;20(1):56. doi: 10.1186/s12911-020-1068-5.

A data quality assessment to inform hypertension surveillance using primary care electronic medical record data from Alberta, Canada.一项利用加拿大艾伯塔省基层医疗电子病历数据进行高血压监测的数据质量评估。

BMC Public Health. 2021 Feb 2;21(1):264. doi: 10.1186/s12889-021-10295-w.

Primary care EMR and administrative data linkage in Alberta, Canada: describing the suitability for hypertension surveillance.加拿大艾伯塔省的初级保健电子病历和行政数据链接：描述其用于高血压监测的适宜性。

BMJ Health Care Inform. 2020 Aug;27(3). doi: 10.1136/bmjhci-2020-100161.

Development of an algorithm for determining smoking status and behaviour over the life course from UK electronic primary care records.开发一种从英国电子初级保健记录中确定一生中吸烟状况和行为的算法。

BMC Med Inform Decis Mak. 2017 Jan 5;17(1):2. doi: 10.1186/s12911-016-0400-6.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

From patient care to research: a validation study examining the factors contributing to data quality in a primary care electronic medical record database.从患者护理到研究：一项验证性研究，考察基层医疗电子病历数据库中影响数据质量的因素。

BMC Fam Pract. 2015 Feb 5;16:11. doi: 10.1186/s12875-015-0223-z.

Issues in multiple imputation of missing data for large general practice clinical databases.大型全科临床数据库缺失数据多重插补中的问题。

Pharmacoepidemiol Drug Saf. 2010 Jun;19(6):618-26. doi: 10.1002/pds.1934.

Smoker, ex-smoker or non-smoker? The validity of routinely recorded smoking status in UK primary care: a cross-sectional study.吸烟者、曾经吸烟者还是非吸烟者？英国初级医疗中常规记录的吸烟状况的有效性：一项横断面研究。

BMJ Open. 2014 Apr 23;4(4):e004958. doi: 10.1136/bmjopen-2014-004958.

Validating smoking data from the Veteran's Affairs Health Factors dataset, an electronic data source.验证退伍军人事务健康因素数据集（一个电子数据源）中的吸烟数据。

Nicotine Tob Res. 2011 Dec;13(12):1233-9. doi: 10.1093/ntr/ntr206. Epub 2011 Sep 12.

Illustrating the patient journey through the care continuum: Leveraging structured primary care electronic medical record (EMR) data in Ontario, Canada using chronic obstructive pulmonary disease as a case study.通过医疗连续护理来展示患者的就医过程：以加拿大安大略省的慢性阻塞性肺病为例，利用结构化的初级保健电子病历 (EMR) 数据。

Int J Med Inform. 2020 Aug;140:104159. doi: 10.1016/j.ijmedinf.2020.104159. Epub 2020 May 19.

引用本文的文献

Real-world evidence evaluation of LDL-C in hospitalized patients: a population-based observational study in the timeframe 2021-2022.基于真实世界证据的住院患者 LDL-C 评估：2021-2022 年期间的一项基于人群的观察性研究。

Lipids Health Dis. 2024 Jul 24;23(1):224. doi: 10.1186/s12944-024-02221-x.

Smoking data quality of primary care practices in comparison with smoking data from the New Zealand Māori and Pacific abdominal aortic aneurysm screening programme: an observational study.初级保健实践中的吸烟数据质量与新西兰毛利人和太平洋地区腹主动脉瘤筛查计划中的吸烟数据的比较：一项观察性研究。

BMC Public Health. 2024 Jun 5;24(1):1513. doi: 10.1186/s12889-024-19021-8.

Healthcare providers' knowledge, attitudes and practices on smoking cessation intervention in the Northern Cape.北开普省医疗服务提供者在戒烟干预方面的知识、态度和实践。

Health SA. 2024 Jan 24;29:2489. doi: 10.4102/hsag.v29i0.2489. eCollection 2024.

Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth.从电子健康记录中进行密集表型分析可实现基于机器学习的早产预测。

BMC Med. 2022 Sep 28;20(1):333. doi: 10.1186/s12916-022-02522-x.

A cross-sectional study evaluating cardiovascular risk and statin prescribing in the Canadian Primary Care Sentinel Surveillance Network database.一项在加拿大初级保健监测网络数据库中评估心血管风险和他汀类药物处方的横断面研究。

BMC Prim Care. 2022 May 25;23(1):128. doi: 10.1186/s12875-022-01735-6.

A multi-step approach to managing missing data in time and patient variant electronic health records.一种管理时间和患者变异电子健康记录中缺失数据的多步骤方法。

BMC Res Notes. 2022 Feb 17;15(1):64. doi: 10.1186/s13104-022-05911-w.

Inaccuracies in electronic health records smoking data and a potential approach to address resulting underestimation in determining lung cancer screening eligibility.电子健康记录中吸烟数据的不准确性及解决由此导致的肺癌筛查资格确定中低估问题的潜在方法。

J Am Med Inform Assoc. 2022 Apr 13;29(5):779-788. doi: 10.1093/jamia/ocac020.

A BERT-Based Generation Model to Transform Medical Texts to SQL Queries for Electronic Medical Records: Model Development and Validation.一种基于BERT的生成模型，用于将医学文本转换为电子病历的SQL查询：模型开发与验证

JMIR Med Inform. 2021 Dec 8;9(12):e32698. doi: 10.2196/32698.

本文引用的文献

Achieving quality primary care data: a description of the Canadian Primary Care Sentinel Surveillance Network data capture, extraction, and processing in Alberta.获取高质量基层医疗数据：加拿大基层医疗哨点监测网络在艾伯塔省的数据采集、提取及处理情况描述

Int J Popul Data Sci. 2019 Jul 29;4(2):1132. doi: 10.23889/ijpds.v4i2.1132.

Canadian Cardiovascular Harmonized National Guidelines Endeavour (C-CHANGE) guideline for the prevention and management of cardiovascular disease in primary care: 2018 update.加拿大心血管疾病统一国家指南项目（C-CHANGE）初级保健中心血管疾病预防与管理指南：2018年更新版

CMAJ. 2018 Oct 9;190(40):E1192-E1206. doi: 10.1503/cmaj.180194.

Factors associated with hypertension control among older Canadians.与加拿大老年人高血压控制相关的因素。

Health Rep. 2018 Jun 20;29(6):3-10.

Possible Sources of Bias in Primary Care Electronic Health Record Data Use and Reuse.基层医疗电子健康记录数据使用与再利用中可能存在的偏差来源。

J Med Internet Res. 2018 May 29;20(5):e185. doi: 10.2196/jmir.9134.

Estimation of smoking prevalence in Canada: Implications of survey characteristics in the CCHS and CTUMS/CTADS.加拿大吸烟流行率的估计：加拿大社区健康调查（CCHS）和加拿大烟草使用监测系统/加拿大烟草广告与促销调查（CTUMS/CTADS）中调查特征的影响

Can J Public Health. 2017 Sep 14;108(3):e331-e334. doi: 10.17269/CJPH.108.5895.

Missing data and multiple imputation in clinical epidemiological research.临床流行病学研究中的缺失数据与多重填补

Clin Epidemiol. 2017 Mar 15;9:157-166. doi: 10.2147/CLEP.S129785. eCollection 2017.

Data Resource Profile: National electronic medical record data from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN).数据资源简介：来自加拿大初级保健哨点监测网络（CPCSSN）的国家电子病历数据。

Int J Epidemiol. 2017 Aug 1;46(4):1091-1092f. doi: 10.1093/ije/dyw248.

A content analysis of electronic health record (EHR) functionality to support tobacco treatment.支持烟草治疗的电子健康记录（EHR）功能的内容分析。

Transl Behav Med. 2017 Jun;7(2):148-156. doi: 10.1007/s13142-016-0446-0.

Improving the quality of EHR recording in primary care: a data quality feedback tool.提高基层医疗中电子健康记录的质量：一种数据质量反馈工具。

J Am Med Inform Assoc. 2017 Jan;24(1):81-87. doi: 10.1093/jamia/ocw054. Epub 2016 Jun 6.

Structured data quality reports to improve EHR data quality.用于提高电子健康记录（EHR）数据质量的结构化数据质量报告。

Int J Med Inform. 2015 Dec;84(12):1094-8. doi: 10.1016/j.ijmedinf.2015.09.008. Epub 2015 Oct 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

改进初级保健 EMR 数据库中吸烟记录质量的方法：探索多种插补和模式匹配算法。

Methods to improve the quality of smoking records in a primary care EMR database: exploring multiple imputation and pattern-matching algorithms.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献