Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, USA.
Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, Florida, USA.
J Am Med Inform Assoc. 2020 Dec 9;27(12):1999-2010. doi: 10.1093/jamia/ocaa245.
To synthesize data quality (DQ) dimensions and assessment methods of real-world data, especially electronic health records, through a systematic scoping review and to assess the practice of DQ assessment in the national Patient-centered Clinical Research Network (PCORnet).
We started with 3 widely cited DQ literature-2 reviews from Chan et al (2010) and Weiskopf et al (2013a) and 1 DQ framework from Kahn et al (2016)-and expanded our review systematically to cover relevant articles published up to February 2020. We extracted DQ dimensions and assessment methods from these studies, mapped their relationships, and organized a synthesized summarization of existing DQ dimensions and assessment methods. We reviewed the data checks employed by the PCORnet and mapped them to the synthesized DQ dimensions and methods.
We analyzed a total of 3 reviews, 20 DQ frameworks, and 226 DQ studies and extracted 14 DQ dimensions and 10 assessment methods. We found that completeness, concordance, and correctness/accuracy were commonly assessed. Element presence, validity check, and conformance were commonly used DQ assessment methods and were the main focuses of the PCORnet data checks.
Definitions of DQ dimensions and methods were not consistent in the literature, and the DQ assessment practice was not evenly distributed (eg, usability and ease-of-use were rarely discussed). Challenges in DQ assessments, given the complex and heterogeneous nature of real-world data, exist.
The practice of DQ assessment is still limited in scope. Future work is warranted to generate understandable, executable, and reusable DQ measures.
通过系统的范围综述,综合现实数据(尤其是电子健康记录)的数据质量(DQ)维度和评估方法,并评估国家以患者为中心的临床研究网络(PCORnet)中 DQ 评估的实践情况。
我们从 Chan 等人(2010 年)和 Weiskopf 等人(2013a)的 2 篇广为引用的 DQ 文献综述以及 Kahn 等人(2016 年)的 1 个 DQ 框架入手,系统地扩展了我们的综述,涵盖了截至 2020 年 2 月发表的相关文章。我们从这些研究中提取了 DQ 维度和评估方法,绘制了它们之间的关系,并对现有的 DQ 维度和评估方法进行了综合总结。我们回顾了 PCORnet 采用的数据检查,并将其映射到综合的 DQ 维度和方法上。
我们分析了总共 3 篇综述、20 个 DQ 框架和 226 个 DQ 研究,提取了 14 个 DQ 维度和 10 个评估方法。我们发现完整性、一致性和正确性/准确性是常见的评估维度。元素存在、有效性检查和一致性是常用的 DQ 评估方法,也是 PCORnet 数据检查的主要关注点。
文献中 DQ 维度和方法的定义不一致,DQ 评估实践的分布也不均匀(例如,可用性和易用性很少被讨论)。由于真实世界数据的复杂性和异质性,DQ 评估存在挑战。
DQ 评估的实践仍然有限。未来需要开展工作,以生成可理解、可执行和可重复使用的 DQ 度量标准。