• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

电子健康记录的数据提取与处理对流行病学研究的影响:观察性研究

Implications of Data Extraction and Processing of Electronic Health Records for Epidemiological Research: Observational Study.

作者信息

van Essen Melissa H J, Twickler Robin, Weesie Yvette M, Arslan Ilgin G, Groenhof Feikje, Peters Lilian L, Bos Isabelle, Verheij Robert A

机构信息

Tranzo, School of Social Sciences and Behavioural Research, Tilburg University, Reitse Poort 1, RP126, Professor Cobbenhagenlaan 125, Tilburg, The Netherlands, 31 631978419.

Nivel, Netherlands Institute for Health Services Research, Utrecht, The Netherlands.

出版信息

J Med Internet Res. 2025 Jun 11;27:e64628. doi: 10.2196/64628.

DOI:10.2196/64628
PMID:40498913
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12176071/
Abstract

BACKGROUND

The use of routinely recorded electronic health record (EHR) data is increasingly common, especially in epidemiological research. However, data must be processed and prepared for secondary use, and decisions made during this process could significantly impact research outcomes. A demonstration of the extent of these consequences is necessary.

OBJECTIVE

The aim of this study was to investigate the influence of data processing steps on research outcomes derived from the secondary use of EHR data.

METHODS

EHR data from 8 Dutch general practices from 2019 were used. These practices contributed data to 2 research databases: the Academic General Practitioner Development Network registry and the Nivel Primary Care Database. Data were extracted and processed through distinct extraction, transformation, and loading (ETL) pipelines, allowing the evaluation of the impact of different ETL methods by comparing the 2 datasets in three steps: (1) patient demographics, (2) epidemiology of concordant patients, and (3) health service use of patients with 3 diagnoses. A number of similarity indicators, including the number of contacts, regular consultations and visits, prescriptions, and episodes, were compared between the 2 databases. The outcomes were compared by performing paired samples t tests using 99% CIs. Prevalence, number of prescriptions, and number of regular consultations and visits per 1000 patient years were calculated and compared for 3 diagnoses (diabetes mellitus, urinary tract infection, and cough). These outcomes were compared using the SD.

RESULTS

Differences were observed between the datasets in the number of enrolled patients (Academic General Practitioner Development Network registry: n=47,517; Nivel Primary Care Database: n=44,247). Despite this, patient demographics were similar. All indicator outcomes of the concordant patients showed significant differences between the databases, that is, the number of contacts, prescriptions, and episodes per patient, and the number of regular consultations and visits. Differences in the indicator outcomes for the 3 diagnosis groups varied greatly in SD, however, none of the differences were deemed significant.

CONCLUSIONS

The findings highlight the importance of routine health data users' awareness of different ETL steps involved. Transparency and shared knowledge about these processes are critical, and making them available for research is necessary. Data processors should share their knowledge regarding their choices, and researchers and policy makers should invest in their knowledge of this type of metadata. Transparency and shared knowledge are particularly important in light of the European Health Data Space and the ever-increasing secondary use of routinely recorded health data. Future research should focus on the role of transparency, joint decision-making, and the minimization of effects of ETL steps, and on the insight into the individual influence of ETL steps on research outcomes. This could stimulate standardized approaches among data processors and researchers, resulting in increased data interoperability.

摘要

背景

常规记录的电子健康记录(EHR)数据的使用越来越普遍,尤其是在流行病学研究中。然而,数据必须经过处理和准备才能用于二次利用,在此过程中做出的决策可能会对研究结果产生重大影响。有必要证明这些后果的严重程度。

目的

本研究的目的是调查数据处理步骤对从EHR数据二次利用中得出的研究结果的影响。

方法

使用了来自荷兰8家全科诊所2019年的EHR数据。这些诊所将数据贡献给了2个研究数据库:学术全科医生发展网络登记处和Nivel初级保健数据库。数据通过不同的提取、转换和加载(ETL)管道进行提取和处理,通过比较2个数据集在三个步骤中的情况来评估不同ETL方法的影响:(1)患者人口统计学特征,(2)相符患者的流行病学情况,(3)患有3种诊断疾病患者的医疗服务使用情况。比较了2个数据库之间的一些相似性指标,包括接触次数、定期会诊和就诊次数、处方数量和发作次数。使用99%置信区间进行配对样本t检验来比较结果。计算并比较了3种诊断疾病(糖尿病、尿路感染和咳嗽)每1000患者年的患病率、处方数量以及定期会诊和就诊次数。使用标准差比较这些结果。

结果

在登记患者数量方面,数据集之间存在差异(学术全科医生发展网络登记处:n = 47,517;Nivel初级保健数据库:n = 44,247)。尽管如此,患者人口统计学特征相似。相符患者的所有指标结果在数据库之间均显示出显著差异,即每位患者的接触次数、处方数量和发作次数,以及定期会诊和就诊次数。3个诊断组的指标结果差异在标准差方面差异很大,然而,没有一个差异被认为是显著的。

结论

研究结果强调了常规健康数据用户了解所涉及的不同ETL步骤的重要性。这些过程的透明度和共享知识至关重要,使其可用于研究是必要的。数据处理者应分享他们关于其选择的知识,研究人员和政策制定者应增加对这类元数据的了解。鉴于欧洲健康数据空间以及常规记录的健康数据二次利用的不断增加,透明度和共享知识尤为重要。未来的研究应关注透明度、联合决策以及ETL步骤影响最小化的作用,以及深入了解ETL步骤对研究结果的个体影响。这可能会促进数据处理者和研究人员之间的标准化方法,从而提高数据的互操作性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bd5/12176071/687ed693c2c2/jmir-v27-e64628-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bd5/12176071/1b013b478ee0/jmir-v27-e64628-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bd5/12176071/2bee28dc179b/jmir-v27-e64628-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bd5/12176071/687ed693c2c2/jmir-v27-e64628-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bd5/12176071/1b013b478ee0/jmir-v27-e64628-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bd5/12176071/2bee28dc179b/jmir-v27-e64628-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bd5/12176071/687ed693c2c2/jmir-v27-e64628-g003.jpg

相似文献

1
Implications of Data Extraction and Processing of Electronic Health Records for Epidemiological Research: Observational Study.电子健康记录的数据提取与处理对流行病学研究的影响:观察性研究
J Med Internet Res. 2025 Jun 11;27:e64628. doi: 10.2196/64628.
2
Adapting Safety Plans for Autistic Adults with Involvement from the Autism Community.在自闭症群体的参与下为成年自闭症患者调整安全计划。
Autism Adulthood. 2025 May 28;7(3):293-302. doi: 10.1089/aut.2023.0124. eCollection 2025 Jun.
3
Surveillance for Violent Deaths - National Violent Death Reporting System, 50 States, the District of Columbia, and Puerto Rico, 2022.暴力死亡监测——2022年全国暴力死亡报告系统,50个州、哥伦比亚特区和波多黎各
MMWR Surveill Summ. 2025 Jun 12;74(5):1-42. doi: 10.15585/mmwr.ss7405a1.
4
Use of β-adrenoreceptor drugs and Parkinson's disease incidence in women from the French E3N cohort study.来自法国E3N队列研究的女性中β-肾上腺素能受体药物的使用与帕金森病发病率
J Parkinsons Dis. 2025 Apr 29:1877718X251330993. doi: 10.1177/1877718X251330993.
5
Electronic cigarettes for smoking cessation.用于戒烟的电子烟。
Cochrane Database Syst Rev. 2025 Jan 29;1(1):CD010216. doi: 10.1002/14651858.CD010216.pub9.
6
Patient and Public Perceptions of 3D Technologies (Models and Images) to Facilitate Health Care Consultations: Exploratory, Mixed Methods Study.患者及公众对用于辅助医疗咨询的3D技术(模型和图像)的认知:探索性混合方法研究
JMIR Form Res. 2025 Jun 18;9:e65235. doi: 10.2196/65235.
7
Community views on mass drug administration for soil-transmitted helminths: a qualitative evidence synthesis.社区对土壤传播蠕虫群体药物给药的看法:定性证据综合分析
Cochrane Database Syst Rev. 2025 Jun 20;6:CD015794. doi: 10.1002/14651858.CD015794.pub2.
8
Using Real Electronic Health Records in Undergraduate Education: Roundtable Discussion.在本科教育中使用真实电子健康记录:圆桌讨论
JMIR Form Res. 2025 Jun 12;9:e60789. doi: 10.2196/60789.
9
Pelvic floor muscle training with feedback or biofeedback for urinary incontinence in women.针对女性尿失禁的盆底肌训练及反馈或生物反馈训练
Cochrane Database Syst Rev. 2025 Mar 11;3(3):CD009252. doi: 10.1002/14651858.CD009252.pub2.
10
Parents' experiences of psychotherapeutic support on the neonatal unit: A mixed methods systematic review to inform intervention development for a multicultural population.父母在新生儿病房接受心理治疗支持的经历:一项混合方法的系统评价,为多元文化人群的干预发展提供信息。
Nurs Crit Care. 2025 May;30(3):e13194. doi: 10.1111/nicc.13194. Epub 2024 Oct 28.

本文引用的文献

1
Data Resource Profile: Nivel Primary Care Database (Nivel-PCD), The Netherlands.数据资源简介:荷兰Nivel初级保健数据库(Nivel-PCD)
Int J Epidemiol. 2025 Feb 16;54(2). doi: 10.1093/ije/dyaf017.
2
Comparison of observational methods to identify and characterize post-COVID syndrome in the Netherlands using electronic health records and questionnaires.荷兰利用电子健康记录和问卷调查识别及描述新冠后综合征的观察方法比较
PLoS One. 2025 Jan 29;20(1):e0318272. doi: 10.1371/journal.pone.0318272. eCollection 2025.
3
Data Resource Profile: Registry of electronic health records of general practices in the north of The Netherlands (AHON).
数据资源简介:荷兰北部全科医疗电子健康记录登记处(AHON)
Int J Epidemiol. 2024 Feb 14;53(2). doi: 10.1093/ije/dyae021.
4
Dutch GP healthcare consumption in COVID-19 heterogeneous regions: an interregional time-series approach in 2020-2021.荷兰全科医生在新冠疫情不同地区的医疗保健消费:2020 - 2021年的区域间时间序列研究方法
BJGP Open. 2024 Jul 29;8(2). doi: 10.3399/BJGPO.2023.0121. Print 2024 Jul.
5
A Natural Language Processing Model for COVID-19 Detection Based on Dutch General Practice Electronic Health Records by Using Bidirectional Encoder Representations From Transformers: Development and Validation Study.基于荷兰全科电子健康记录的 COVID-19 检测自然语言处理模型:使用转换器的双向编码器表示进行开发和验证研究。
J Med Internet Res. 2023 Oct 4;25:e49944. doi: 10.2196/49944.
6
Care by general practitioners for patients with asthma or COPD during the COVID-19 pandemic.全科医生在 COVID-19 大流行期间对哮喘或 COPD 患者的护理。
NPJ Prim Care Respir Med. 2023 Apr 8;33(1):15. doi: 10.1038/s41533-023-00340-z.
7
Completeness and Representativeness of the PHARMO General Practitioner (GP) Data: A Comparison with National Statistics.PHARMO全科医生(GP)数据的完整性和代表性:与国家统计数据的比较。
Clin Epidemiol. 2023 Jan 5;15:1-11. doi: 10.2147/CLEP.S389598. eCollection 2023.
8
Detection of primary Sjögren's syndrome in primary care: developing a classification model with the use of routine healthcare data and machine learning.初级保健中原发性干燥综合征的检测:使用常规医疗保健数据和机器学习开发分类模型。
BMC Prim Care. 2022 Aug 9;23(1):199. doi: 10.1186/s12875-022-01804-w.
9
The use of out-of-hours primary care during the first year of the COVID-19 pandemic.在 COVID-19 大流行的第一年使用非工作时间的初级保健。
BMC Health Serv Res. 2022 May 21;22(1):679. doi: 10.1186/s12913-022-08096-x.
10
What makes administrative data "research-ready"? A systematic review and thematic analysis of published literature.使行政数据“研究就绪”的因素是什么?已发表文献的系统回顾和主题分析。
Int J Popul Data Sci. 2022 Apr 27;7(1):1718. doi: 10.23889/ijpds.v6i1.1718. eCollection 2022.