DSI WIND, AP-HP, Paris, France; INSERM, U1142, LIMICS, F-75006, Paris, France; Sorbonne Universités, Paris, France.
DSI WIND, AP-HP, Paris, France.
Comput Methods Programs Biomed. 2019 Nov;181:104804. doi: 10.1016/j.cmpb.2018.10.016. Epub 2018 Nov 9.
Data Quality (DQ) programs are recognized as a critical aspect of new-generation research platforms using electronic health record (EHR) data for building Learning Healthcare Systems. The AP-HP Clinical Data Repository aggregates EHR data from 37 hospitals to enable large-scale research and secondary data analysis. This paper describes the DQ program currently in place at AP-HP and the lessons learned from two DQ campaigns initiated in 2017.
As part of the AP-HP DQ program, two domains - patient identification (PI) and healthcare services (HS) - were selected for conducting DQ campaigns consisting of 5 phases: defining the scope, measuring, analyzing, improving and controlling DQ. Semi-automated DQ profiling was conducted in two data sets - the PI data set containing 8.8 M patients and the HS data set containing 13,099 consultation agendas and 2122 care units. Seventeen DQ measures were defined and DQ issues were classified using a unified DQ reporting framework. For each domain, actions plans were defined for improving and monitoring prioritized DQ issues.
Eleven identified DQ issues (8 for the PI data set and 3 for the HS data set) were categorized into completeness (n = 6), conformance (n = 3) and plausibility (n = 2) DQ issues. DQ issues were caused by errors from data originators, ETL issues or limitations of the EHR data entry tool. The action plans included sixteen actions (9 for the PI domain and 7 for the HS domain). Though only partial implementation, the DQ campaigns already resulted in significant improvement of DQ measures.
DQ assessments of hospital information systems are largely unpublished. The preliminary results of two DQ campaigns conducted at AP-HP illustrate the benefit of the engagement into a DQ program. The adoption of a unified DQ reporting framework enables the communication of DQ findings in a well-defined manner with a shared vocabulary. Dedicated tooling is needed to automate and extend the scope of the generic DQ program. Specific DQ checks will be additionally defined on a per-study basis to evaluate whether EHR data fits for specific uses.
数据质量(DQ)计划被认为是使用电子健康记录(EHR)数据构建学习型医疗保健系统的新一代研究平台的关键方面。AP-HP 临床数据仓库聚合了来自 37 家医院的 EHR 数据,以实现大规模研究和二次数据分析。本文介绍了 AP-HP 目前的数据质量计划以及 2017 年启动的两次数据质量活动中获得的经验教训。
作为 AP-HP 数据质量计划的一部分,选择了患者标识(PI)和医疗服务(HS)两个领域进行数据质量活动,该活动由五个阶段组成:定义范围、测量、分析、改进和控制数据质量。在两个数据集(包含 880 万患者的 PI 数据集和包含 13099 次就诊议程和 2122 个护理单元的 HS 数据集)中进行了半自动化的数据质量分析。定义了 17 项数据质量措施,并使用统一的数据质量报告框架对数据质量问题进行分类。对于每个领域,都为改进和监控优先级数据质量问题定义了行动计划。
确定了 11 个数据质量问题(PI 数据集 8 个,HS 数据集 3 个),分为完整性(n=6)、一致性(n=3)和合理性(n=2)数据质量问题。数据质量问题是由数据创建者的错误、ETL 问题或 EHR 数据输入工具的限制引起的。行动计划包括 16 项行动(PI 领域 9 项,HS 领域 7 项)。尽管只是部分实施,但数据质量活动已经显著提高了数据质量措施。
医院信息系统的数据质量评估在很大程度上尚未公布。AP-HP 进行的两次数据质量活动的初步结果说明了参与数据质量计划的好处。采用统一的数据质量报告框架可以以明确定义的方式用共享词汇进行数据质量发现的沟通。需要专用工具来自动化和扩展通用数据质量计划的范围。还将根据每项研究的具体情况定义特定的数据质量检查,以评估 EHR 数据是否适合特定用途。