利用电子健康记录提高生物医学研究结果可重复性的方法。

Methods for enhancing the reproducibility of biomedical research findings using electronic health records.

作者信息

Denaxas Spiros, Direk Kenan, Gonzalez-Izquierdo Arturo, Pikoula Maria, Cakiroglu Aylin, Moore Jason, Hemingway Harry, Smeeth Liam

机构信息

Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA UK.

Farr Institute of Health Informatics Research, 222 Euston Road, London, UK.

出版信息

BioData Min. 2017 Sep 11;10:31. doi: 10.1186/s13040-017-0151-7. eCollection 2017.

DOI:10.1186/s13040-017-0151-7

PMID:28912836

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5594436/

Abstract

BACKGROUND

The ability of external investigators to reproduce published scientific findings is critical for the evaluation and validation of biomedical research by the wider community. However, a substantial proportion of health research using electronic health records (EHR), data collected and generated during clinical care, is potentially not reproducible mainly due to the fact that the implementation details of most data preprocessing, cleaning, phenotyping and analysis approaches are not systematically made available or shared. With the complexity, volume and variety of electronic health record data sources made available for research steadily increasing, it is critical to ensure that scientific findings from EHR data are reproducible and replicable by researchers. Reporting guidelines, such as RECORD and STROBE, have set a solid foundation by recommending a series of items for researchers to include in their research outputs. Researchers however often lack the technical tools and methodological approaches to actuate such recommendations in an efficient and sustainable manner.

RESULTS

In this paper, we review and propose a series of methods and tools utilized in adjunct scientific disciplines that can be used to enhance the reproducibility of research using electronic health records and enable researchers to report analytical approaches in a transparent manner. Specifically, we discuss the adoption of scientific software engineering principles and best-practices such as test-driven development, source code revision control systems, literate programming and the standardization and re-use of common data management and analytical approaches.

CONCLUSION

The adoption of such approaches will enable scientists to systematically document and share EHR analytical workflows and increase the reproducibility of biomedical research using such complex data sources.

摘要

背景

外部研究人员重现已发表科学发现的能力对于广大科学界评估和验证生物医学研究至关重要。然而，相当一部分利用电子健康记录（EHR）（临床护理期间收集和生成的数据）开展的健康研究可能无法重现，主要原因是大多数数据预处理、清理、表型分析和分析方法的实施细节未得到系统提供或共享。随着可用于研究的电子健康记录数据源的复杂性、数量和多样性不断增加，确保研究人员能够重现和复制来自电子健康记录数据的科学发现至关重要。诸如RECORD和STROBE等报告指南通过推荐一系列项目供研究人员纳入其研究成果，奠定了坚实基础。然而，研究人员往往缺乏以高效且可持续的方式落实这些建议的技术工具和方法。

结果

在本文中，我们回顾并提出了一系列在相关科学学科中使用的方法和工具，这些方法和工具可用于提高使用电子健康记录的研究的可重复性，并使研究人员能够以透明的方式报告分析方法。具体而言，我们讨论了科学软件工程原则和最佳实践的采用，如测试驱动开发、源代码版本控制系统、文学编程以及通用数据管理和分析方法的标准化与重用。

结论

采用这些方法将使科学家能够系统地记录和共享电子健康记录分析工作流程，并提高使用此类复杂数据源的生物医学研究的可重复性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9e3/5594436/63ca0a10611a/13040_2017_151_Fig1_HTML.jpg

相似文献

Methods for enhancing the reproducibility of biomedical research findings using electronic health records.

BioData Min. 2017 Sep 11;10:31. doi: 10.1186/s13040-017-0151-7. eCollection 2017.

The future of Cochrane Neonatal.

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

Clinical code set engineering for reusing EHR data for research: A review.

J Biomed Inform. 2017 Jun;70:1-13. doi: 10.1016/j.jbi.2017.04.010. Epub 2017 Apr 22.

Analytical code sharing practices in biomedical research.

PeerJ Comput Sci. 2024 Jun 28;10:e2066. doi: 10.7717/peerj-cs.2066. eCollection 2024.

Analytical code sharing practices in biomedical research.

bioRxiv. 2023 Aug 7:2023.07.31.551384. doi: 10.1101/2023.07.31.551384.

Repeat: a framework to assess empirical reproducibility in biomedical research.

BMC Med Res Methodol. 2017 Sep 18;17(1):143. doi: 10.1186/s12874-017-0377-6.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

OpenSAFELY: A platform for analysing electronic health records designed for reproducible research.

Pharmacoepidemiol Drug Saf. 2024 Jun;33(6):e5815. doi: 10.1002/pds.5815.

Facilitating biomedical researchers' interrogation of electronic health record data: Ideas from outside of biomedical informatics.

J Biomed Inform. 2016 Apr;60:376-84. doi: 10.1016/j.jbi.2016.03.004. Epub 2016 Mar 10.

CALIFRAME: a proposed method of calibrating reporting guidelines with FAIR principles to foster reproducibility of AI research in medicine.

JAMIA Open. 2024 Oct 18;7(4):ooae105. doi: 10.1093/jamiaopen/ooae105. eCollection 2024 Dec.

引用本文的文献

A Standardized Guideline for Assessing Extracted Electronic Health Records Cohorts: A Scoping Review.

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:527-536. eCollection 2025.

Optimal Surrogate-Assisted Sampling for Cost-Efficient Validation of Electronic Health Record Outcomes.

Stat Med. 2025 May;44(10-12):e70095. doi: 10.1002/sim.70095.

EHRchitect: An open-source software tool for medical event sequences data extraction from Electronic Health Records.

J Clin Transl Sci. 2025 Mar 26;9(1):e79. doi: 10.1017/cts.2025.55. eCollection 2025.

Transparency in the secondary use of health data: assessing the status quo of guidance and best practices.

R Soc Open Sci. 2025 Mar 26;12(3):241364. doi: 10.1098/rsos.241364. eCollection 2025 Mar.

Utilization of Computable Phenotypes in Electronic Health Record Research: A Review and Case Study in Atopic Dermatitis.

J Invest Dermatol. 2025 May;145(5):1008-1016. doi: 10.1016/j.jid.2024.08.025. Epub 2024 Nov 1.

Development and Validation of a Tool to Identify Patients Diagnosed With Castration-Resistant Prostate Cancer.

JCO Clin Cancer Inform. 2023 Sep;7:e2300085. doi: 10.1200/CCI.23.00085.

Determining prescriptions in electronic healthcare record data: methods for development of standardized, reproducible drug codelists.

JAMIA Open. 2023 Aug 29;6(3):ooad078. doi: 10.1093/jamiaopen/ooad078. eCollection 2023 Oct.

Transforming and evaluating electronic health record disease phenotyping algorithms using the OMOP common data model: a case study in heart failure.

JAMIA Open. 2021 Feb 4;4(3):ooab001. doi: 10.1093/jamiaopen/ooab001. eCollection 2021 Jul.

Assessing the impact of introductory programming workshops on the computational reproducibility of biomedical workflows.

PLoS One. 2020 Jul 8;15(7):e0230697. doi: 10.1371/journal.pone.0230697. eCollection 2020.

Electronic health records and polygenic risk scores for predicting disease risk.

Nat Rev Genet. 2020 Aug;21(8):493-502. doi: 10.1038/s41576-020-0224-1. Epub 2020 Mar 31.

本文引用的文献

Evaluation of Semantic Web Technologies for Storing Computable Definitions of Electronic Health Records Phenotyping Algorithms.

AMIA Annu Symp Proc. 2018 Apr 16;2017:1352-1361. eCollection 2017.

Big biomedical data and cardiovascular disease research: opportunities and challenges.

Eur Heart J Qual Care Clin Outcomes. 2015 Jul 1;1(1):9-16. doi: 10.1093/ehjqcco/qcv005.

Big data from electronic health records for early and late translational cardiovascular research: challenges and potential.

Eur Heart J. 2018 Apr 21;39(16):1481-1495. doi: 10.1093/eurheartj/ehx487.

Association between clinically recorded alcohol consumption and initial presentation of 12 cardiovascular diseases: population based cohort study using linked health records.

BMJ. 2017 Mar 22;356:j909. doi: 10.1136/bmj.j909.

Reproducibility of computational workflows is automated using continuous analysis.

Nat Biotechnol. 2017 Apr;35(4):342-346. doi: 10.1038/nbt.3780. Epub 2017 Mar 13.

BIDS apps: Improving ease of use, accessibility, and reproducibility of neuroimaging data analysis methods.

PLoS Comput Biol. 2017 Mar 9;13(3):e1005209. doi: 10.1371/journal.pcbi.1005209. eCollection 2017 Mar.

rEHR: An R package for manipulating and analysing Electronic Health Record data.

PLoS One. 2017 Feb 23;12(2):e0171784. doi: 10.1371/journal.pone.0171784. eCollection 2017.

Ten Simple Rules for Developing Usable Software in Computational Biology.

PLoS Comput Biol. 2017 Jan 5;13(1):e1005265. doi: 10.1371/journal.pcbi.1005265. eCollection 2017 Jan.

Prognostic burden of heart failure recorded in primary care, acute hospital admissions, or both: a population-based linked electronic health record cohort study in 2.1 million people.

Eur J Heart Fail. 2017 Sep;19(9):1119-1127. doi: 10.1002/ejhf.709. Epub 2016 Dec 23.

Evolution of primary care databases in UK: a scientometric analysis of research output.

BMJ Open. 2016 Oct 11;6(10):e012785. doi: 10.1136/bmjopen-2016-012785.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用电子健康记录提高生物医学研究结果可重复性的方法。

Methods for enhancing the reproducibility of biomedical research findings using electronic health records.

作者信息

Denaxas Spiros, Direk Kenan, Gonzalez-Izquierdo Arturo, Pikoula Maria, Cakiroglu Aylin, Moore Jason, Hemingway Harry, Smeeth Liam

机构信息

Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA UK.

Farr Institute of Health Informatics Research, 222 Euston Road, London, UK.