Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
J Am Med Inform Assoc. 2013 Jan 1;20(1):144-51. doi: 10.1136/amiajnl-2011-000681. Epub 2012 Jun 25.
To review the methods and dimensions of data quality assessment in the context of electronic health record (EHR) data reuse for research.
A review of the clinical research literature discussing data quality assessment methodology for EHR data was performed. Using an iterative process, the aspects of data quality being measured were abstracted and categorized, as well as the methods of assessment used.
Five dimensions of data quality were identified, which are completeness, correctness, concordance, plausibility, and currency, and seven broad categories of data quality assessment methods: comparison with gold standards, data element agreement, data source agreement, distribution comparison, validity checks, log review, and element presence.
Examination of the methods by which clinical researchers have investigated the quality and suitability of EHR data for research shows that there are fundamental features of data quality, which may be difficult to measure, as well as proxy dimensions. Researchers interested in the reuse of EHR data for clinical research are recommended to consider the adoption of a consistent taxonomy of EHR data quality, to remain aware of the task-dependence of data quality, to integrate work on data quality assessment from other fields, and to adopt systematic, empirically driven, statistically based methods of data quality assessment.
There is currently little consistency or potential generalizability in the methods used to assess EHR data quality. If the reuse of EHR data for clinical research is to become accepted, researchers should adopt validated, systematic methods of EHR data quality assessment.
综述电子健康记录(EHR)数据再用于研究时的数据质量评估方法和维度。
对讨论 EHR 数据数据质量评估方法的临床研究文献进行了回顾。采用迭代过程,提取和分类所测量的数据质量方面,并评估使用的评估方法。
确定了数据质量的五个维度,分别是完整性、正确性、一致性、合理性和时效性,并确定了七种广泛的数据质量评估方法类别:与金标准比较、数据元素一致性、数据源一致性、分布比较、有效性检查、日志审查和元素存在。
对临床研究人员调查 EHR 数据质量和适用性的方法进行检查表明,数据质量具有一些基本特征,这些特征可能难以衡量,同时还存在一些代理维度。建议对 EHR 数据进行临床研究再利用的研究人员考虑采用一致的 EHR 数据质量分类法,注意数据质量的任务依赖性,整合来自其他领域的数据质量评估工作,并采用系统的、经验驱动的、基于统计学的数据质量评估方法。
目前,评估 EHR 数据质量的方法几乎没有一致性或潜在的通用性。如果要接受将 EHR 数据再用于临床研究,研究人员应采用经过验证的、系统的 EHR 数据质量评估方法。