每位读者应该了解的关于使用电子健康记录数据的研究，但可能不敢问的事。

What Every Reader Should Know About Studies Using Electronic Health Record Data but May Be Afraid to Ask.

作者信息

Kohane Isaac S, Aronow Bruce J, Avillach Paul, Beaulieu-Jones Brett K, Bellazzi Riccardo, Bradford Robert L, Brat Gabriel A, Cannataro Mario, Cimino James J, García-Barrio Noelia, Gehlenborg Nils, Ghassemi Marzyeh, Gutiérrez-Sacristán Alba, Hanauer David A, Holmes John H, Hong Chuan, Klann Jeffrey G, Loh Ne Hooi Will, Luo Yuan, Mandl Kenneth D, Daniar Mohamad, Moore Jason H, Murphy Shawn N, Neuraz Antoine, Ngiam Kee Yuan, Omenn Gilbert S, Palmer Nathan, Patel Lav P, Pedrera-Jiménez Miguel, Sliz Piotr, South Andrew M, Tan Amelia Li Min, Taylor Deanne M, Taylor Bradley W, Torti Carlo, Vallejos Andrew K, Wagholikar Kavishwar B, Weber Griffin M, Cai Tianxi

机构信息

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.

Biomedical Informatics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH, United States.

出版信息

J Med Internet Res. 2021 Mar 2;23(3):e22219. doi: 10.2196/22219.

DOI:10.2196/22219

PMID:33600347

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7927948/

Abstract

Coincident with the tsunami of COVID-19-related publications, there has been a surge of studies using real-world data, including those obtained from the electronic health record (EHR). Unfortunately, several of these high-profile publications were retracted because of concerns regarding the soundness and quality of the studies and the EHR data they purported to analyze. These retractions highlight that although a small community of EHR informatics experts can readily identify strengths and flaws in EHR-derived studies, many medical editorial teams and otherwise sophisticated medical readers lack the framework to fully critically appraise these studies. In addition, conventional statistical analyses cannot overcome the need for an understanding of the opportunities and limitations of EHR-derived studies. We distill here from the broader informatics literature six key considerations that are crucial for appraising studies utilizing EHR data: data completeness, data collection and handling (eg, transformation), data type (ie, codified, textual), robustness of methods against EHR variability (within and across institutions, countries, and time), transparency of data and analytic code, and the multidisciplinary approach. These considerations will inform researchers, clinicians, and other stakeholders as to the recommended best practices in reviewing manuscripts, grants, and other outputs from EHR-data derived studies, and thereby promote and foster rigor, quality, and reliability of this rapidly growing field.

摘要

与新冠疫情相关出版物的“海啸”同时出现的是，使用真实世界数据的研究激增，包括从电子健康记录（EHR）中获取的数据。不幸的是，其中一些备受瞩目的出版物因对研究及其声称分析的EHR数据的合理性和质量的担忧而被撤回。这些撤回事件凸显出，尽管一小群EHR信息学专家能够轻易识别EHR衍生研究中的优势和缺陷，但许多医学编辑团队以及其他经验丰富的医学读者缺乏全面批判性评估这些研究的框架。此外，传统的统计分析无法满足对理解EHR衍生研究的机遇和局限性的需求。我们在此从更广泛的信息学文献中提炼出六个关键考量因素，这些因素对于评估利用EHR数据的研究至关重要：数据完整性、数据收集与处理（如转换）、数据类型（即编码型、文本型）、针对EHR变异性（机构内部、不同机构、国家和时间之间）的方法稳健性、数据和分析代码的透明度以及多学科方法。这些考量因素将告知研究人员、临床医生和其他利益相关者在审查来自EHR数据研究的手稿、资助申请和其他成果时推荐的最佳实践，从而促进和提升这个快速发展领域的严谨性、质量和可靠性。

相似文献

What Every Reader Should Know About Studies Using Electronic Health Record Data but May Be Afraid to Ask.每位读者应该了解的关于使用电子健康记录数据的研究，但可能不敢问的事。

J Med Internet Res. 2021 Mar 2;23(3):e22219. doi: 10.2196/22219.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Data quality considerations for evaluating COVID-19 treatments using real world data: learnings from the National COVID Cohort Collaborative (N3C).利用真实世界数据评估 COVID-19 治疗方法的数据质量考量因素：来自国家 COVID 队列协作组织（N3C）的经验教训。

BMC Med Res Methodol. 2023 Feb 17;23(1):46. doi: 10.1186/s12874-023-01839-2.

Mind the clinical-analytic gap: Electronic health records and COVID-19 pandemic response.注意临床分析差距：电子健康记录与 COVID-19 大流行应对。

J Biomed Inform. 2021 Apr;116:103715. doi: 10.1016/j.jbi.2021.103715. Epub 2021 Feb 19.

Quality of Reporting Electronic Health Record Data in Glaucoma: A Systematic Literature Review.电子健康记录中青光眼数据报告质量的系统文献综述。

Ophthalmol Glaucoma. 2024 Sep-Oct;7(5):422-430. doi: 10.1016/j.ogla.2024.04.002. Epub 2024 Apr 8.

Reusing routine electronic health record data for nationwide COVID-19 surveillance in nursing homes: barriers, facilitators, and lessons learned.在养老院中重新利用常规电子健康记录数据进行全国范围的新冠病毒监测：障碍、促进因素及经验教训

BMC Med Inform Decis Mak. 2024 Dec 27;24(1):408. doi: 10.1186/s12911-024-02818-3.

The future of Cochrane Neonatal.考克兰新生儿协作网的未来。

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

Distinguishing Admissions Specifically for COVID-19 From Incidental SARS-CoV-2 Admissions: National Retrospective Electronic Health Record Study.区分因 COVID-19 而住院与因 SARS-CoV-2 而偶然住院：全国回顾性电子健康记录研究。

J Med Internet Res. 2022 May 18;24(5):e37931. doi: 10.2196/37931.

Comparison of user groups' perspectives of barriers and facilitators to implementing electronic health records: a systematic review.比较用户群体对实施电子健康记录的障碍和促进因素的观点：系统评价。

BMC Med. 2011 Apr 28;9:46. doi: 10.1186/1741-7015-9-46.

Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research.电子健康记录数据质量评估的方法和维度：为临床研究提供可重用性。

J Am Med Inform Assoc. 2013 Jan 1;20(1):144-51. doi: 10.1136/amiajnl-2011-000681. Epub 2012 Jun 25.

引用本文的文献

Predicting Early-Onset Colorectal Cancer in Individuals Below Screening Age Using Machine Learning and Real-World Data: Case Control Study.利用机器学习和真实世界数据预测筛查年龄以下个体的早发性结直肠癌：病例对照研究

JMIR Cancer. 2025 Jun 19;11:e64506. doi: 10.2196/64506.

Clinical Phenotypes May be Able to Identify Populations With Nonalcoholic Fatty Liver-Spectrum Disease.临床表型或许能够识别出患有非酒精性脂肪性肝病谱疾病的人群。

Gastro Hep Adv. 2025 Jan 3;4(5):100611. doi: 10.1016/j.gastha.2024.100611. eCollection 2025.

Clinical Research Informatics: a Decade-in-Review.临床研究信息学：十年回顾

Yearb Med Inform. 2024 Aug;33(1):127-142. doi: 10.1055/s-0044-1800732. Epub 2025 Apr 8.

Transparency in the secondary use of health data: assessing the status quo of guidance and best practices.健康数据二次使用中的透明度：评估指南和最佳实践的现状

R Soc Open Sci. 2025 Mar 26;12(3):241364. doi: 10.1098/rsos.241364. eCollection 2025 Mar.

PromptLink: Leveraging Large Language Models for Cross-Source Biomedical Concept Linking.PromptLink：利用大语言模型进行跨源生物医学概念链接。

Int ACM SIGIR Conf Res Dev Inf Retr. 2024 Jul;2024:2589-2593. doi: 10.1145/3626772.3657904. Epub 2024 Jul 11.

Development of a Core Critical Care Data Dictionary With Common Data Elements to Characterize Critical Illness and Injuries Using a Modified Delphi Method.使用改良德尔菲法开发具有通用数据元素的核心重症监护数据字典，以描述危重病和损伤情况。

Crit Care Med. 2025 May 1;53(5):e1045-e1054. doi: 10.1097/CCM.0000000000006595. Epub 2025 Feb 21.

Breaking ICD Codes: Identifying Ambiguous Respiratory Infection Codes via Regional Diagnosis Heterogeneity.破解国际疾病分类代码：通过区域诊断异质性识别模糊的呼吸道感染代码。

Ann Fam Med. 2025 Jan 27;23(1):9-15. doi: 10.1370/afm.3192.

With big data comes big responsibility: Strategies for utilizing aggregated, standardized, de-identified electronic health record data for research.大数据带来重大责任：利用汇总、标准化、去标识化电子健康记录数据进行研究的策略。

Clin Transl Sci. 2025 Jan;18(1):e70093. doi: 10.1111/cts.70093.

Critical Data for Critical Care: A Primer on Leveraging Electronic Health Record Data for Research From Society of Critical Care Medicine's Panel on Data Sharing and Harmonization.危重症关键数据：利用电子健康记录数据进行研究的指南——来自危重病医学会数据共享与协调专家组

Crit Care Explor. 2024 Nov 15;6(11):e1179. doi: 10.1097/CCE.0000000000001179. eCollection 2024 Nov.

Utilization of Computable Phenotypes in Electronic Health Record Research: A Review and Case Study in Atopic Dermatitis.电子健康记录研究中可计算表型的应用：以特应性皮炎为例的综述与案例研究

J Invest Dermatol. 2025 May;145(5):1008-1016. doi: 10.1016/j.jid.2024.08.025. Epub 2024 Nov 1.

本文引用的文献

International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium.国际电子健康记录衍生的COVID-19临床病程概况：4CE联盟

NPJ Digit Med. 2020 Aug 19;3:109. doi: 10.1038/s41746-020-00308-0. eCollection 2020.

Age and morphology of posterior communicating artery aneurysms.后交通动脉动脉瘤的年龄和形态。

Sci Rep. 2020 Jul 14;10(1):11545. doi: 10.1038/s41598-020-68276-9.

RETRACTED: Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis.撤回：羟氯喹或氯喹联合或不联合大环内酯类药物治疗新型冠状病毒肺炎：一项多国注册分析

Lancet. 2020 May 22. doi: 10.1016/S0140-6736(20)31180-6.

Cardiovascular Disease, Drug Therapy, and Mortality in Covid-19.心血管疾病、药物治疗与新冠病毒感染相关死亡率

N Engl J Med. 2020 Jun 18;382(25):e102. doi: 10.1056/NEJMoa2007621. Epub 2020 May 1.

High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP).使用一种常见的半监督方法（PheCAP）对电子病历数据进行高通量表型分析。

Nat Protoc. 2019 Dec;14(12):3426-3444. doi: 10.1038/s41596-019-0227-6. Epub 2019 Nov 20.

Feature extraction for phenotyping from semantic and knowledge resources.从语义和知识资源中进行表型特征提取。

J Biomed Inform. 2019 Mar;91:103122. doi: 10.1016/j.jbi.2019.103122. Epub 2019 Feb 7.

Determining the Time of Cancer Recurrence Using Claims or Electronic Medical Record Data.利用理赔数据或电子病历数据确定癌症复发时间

JCO Clin Cancer Inform. 2018 Dec;2:1-10. doi: 10.1200/CCI.17.00163.

The reporting of studies conducted using observational routinely collected health data statement for pharmacoepidemiology (RECORD-PE).观察性研究报告规范使用常规收集的健康数据在药物流行病学中的应用（RECORD-PE）声明。

BMJ. 2018 Nov 14;363:k3532. doi: 10.1136/bmj.k3532.

Possible Sources of Bias in Primary Care Electronic Health Record Data Use and Reuse.基层医疗电子健康记录数据使用与再利用中可能存在的偏差来源。

J Med Internet Res. 2018 May 29;20(5):e185. doi: 10.2196/jmir.9134.

Screening pregnant women for suicidal behavior in electronic medical records: diagnostic codes vs. clinical notes processed by natural language processing.在电子病历中筛查孕妇的自杀行为：诊断代码与自然语言处理后的临床记录比较。

BMC Med Inform Decis Mak. 2018 May 29;18(1):30. doi: 10.1186/s12911-018-0617-7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验