Kohane Isaac S, Aronow Bruce J, Avillach Paul, Beaulieu-Jones Brett K, Bellazzi Riccardo, Bradford Robert L, Brat Gabriel A, Cannataro Mario, Cimino James J, García-Barrio Noelia, Gehlenborg Nils, Ghassemi Marzyeh, Gutiérrez-Sacristán Alba, Hanauer David A, Holmes John H, Hong Chuan, Klann Jeffrey G, Loh Ne Hooi Will, Luo Yuan, Mandl Kenneth D, Daniar Mohamad, Moore Jason H, Murphy Shawn N, Neuraz Antoine, Ngiam Kee Yuan, Omenn Gilbert S, Palmer Nathan, Patel Lav P, Pedrera-Jiménez Miguel, Sliz Piotr, South Andrew M, Tan Amelia Li Min, Taylor Deanne M, Taylor Bradley W, Torti Carlo, Vallejos Andrew K, Wagholikar Kavishwar B, Weber Griffin M, Cai Tianxi
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.
Biomedical Informatics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH, United States.
J Med Internet Res. 2021 Mar 2;23(3):e22219. doi: 10.2196/22219.
Coincident with the tsunami of COVID-19-related publications, there has been a surge of studies using real-world data, including those obtained from the electronic health record (EHR). Unfortunately, several of these high-profile publications were retracted because of concerns regarding the soundness and quality of the studies and the EHR data they purported to analyze. These retractions highlight that although a small community of EHR informatics experts can readily identify strengths and flaws in EHR-derived studies, many medical editorial teams and otherwise sophisticated medical readers lack the framework to fully critically appraise these studies. In addition, conventional statistical analyses cannot overcome the need for an understanding of the opportunities and limitations of EHR-derived studies. We distill here from the broader informatics literature six key considerations that are crucial for appraising studies utilizing EHR data: data completeness, data collection and handling (eg, transformation), data type (ie, codified, textual), robustness of methods against EHR variability (within and across institutions, countries, and time), transparency of data and analytic code, and the multidisciplinary approach. These considerations will inform researchers, clinicians, and other stakeholders as to the recommended best practices in reviewing manuscripts, grants, and other outputs from EHR-data derived studies, and thereby promote and foster rigor, quality, and reliability of this rapidly growing field.
与新冠疫情相关出版物的“海啸”同时出现的是,使用真实世界数据的研究激增,包括从电子健康记录(EHR)中获取的数据。不幸的是,其中一些备受瞩目的出版物因对研究及其声称分析的EHR数据的合理性和质量的担忧而被撤回。这些撤回事件凸显出,尽管一小群EHR信息学专家能够轻易识别EHR衍生研究中的优势和缺陷,但许多医学编辑团队以及其他经验丰富的医学读者缺乏全面批判性评估这些研究的框架。此外,传统的统计分析无法满足对理解EHR衍生研究的机遇和局限性的需求。我们在此从更广泛的信息学文献中提炼出六个关键考量因素,这些因素对于评估利用EHR数据的研究至关重要:数据完整性、数据收集与处理(如转换)、数据类型(即编码型、文本型)、针对EHR变异性(机构内部、不同机构、国家和时间之间)的方法稳健性、数据和分析代码的透明度以及多学科方法。这些考量因素将告知研究人员、临床医生和其他利益相关者在审查来自EHR数据研究的手稿、资助申请和其他成果时推荐的最佳实践,从而促进和提升这个快速发展领域的严谨性、质量和可靠性。