Reimer Andrew P, Milinovich Alex, Madigan Elizabeth A
Frances Payne Bolton School of Nursing, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH 44106, United States; Cleveland Clinic, 10900 Euclid Avenue, Cleveland, OH 44195, United States.
Cleveland Clinic, 10900 Euclid Avenue, Cleveland, OH 44195, United States.
Int J Med Inform. 2016 Jun;90:40-7. doi: 10.1016/j.ijmedinf.2016.03.006. Epub 2016 Mar 24.
The proliferation and use of electronic medical records (EMR) in the clinical setting now provide a rich source of clinical data that can be leveraged to support research on patient outcomes, comparative effectiveness, and health systems research. Once the large volume and variety of data that robust clinical EMRs provide is aggregated, the suitability of the data for research purposes must be addressed. Therefore, the purpose of this paper is two-fold. First, we present a stepwise framework capable of guiding initial data quality assessment when matching multiple data sources regardless of context or application. Then, we demonstrate a use case of initial analysis of a longitudinal data repository of electronic health record data that illustrates the first four steps of the framework, and report results.
A six-step data quality assessment framework is proposed and described that includes the following data quality assessment steps: (1) preliminary analysis, (2) documentation-longitudinal concordance, (3) breadth, (4) data element presence, (5) density, and (6) prediction. The six-step framework was applied to the Transport Data Mart-a data repository that contains over 28,000 records for patients that underwent interhospital transfer that includes EMRs from the sending hospitalization, transport, and receiving hospitalization.
There were a total of 9557 log entries of which 8139 were successfully matched to corresponding hospital encounters. 2832 were successfully mapped to both the sending and receiving hospital encounters (resulting in a 93% automatic matching rate), with 590 including air medical transport EMR data representing a complete case for testing. Results from Step 2 indicate that once records are identified and matched, there appears to be relatively limited drop-off of additional records when the criteria for matching increases, indicating the a proportion of records consistently contain nearly complete data. Measures of central tendency used in Step 3 and 4 exhibit a right skewness suggesting that a small proportion of records contain the highest number of repeated measures for the measured variables.
The proposed six-step data quality assessment framework is useful in establishing the metadata for a longitudinal data repository that can be replicated by other studies. There are practical issues that need to be addressed including the data quality assessments-with the most prescient being the need to establish data quality metrics for benchmarking acceptable levels of EMR data inclusiveness through testing and application.
电子病历(EMR)在临床环境中的广泛应用和使用,如今提供了丰富的临床数据来源,可用于支持关于患者预后、比较疗效以及卫生系统研究。一旦汇总了强大的临床电子病历所提供的大量且多样的数据,就必须解决这些数据用于研究目的的适用性问题。因此,本文的目的有两个。首先,我们提出一个逐步框架,能够在匹配多个数据源时指导初始数据质量评估,无论其背景或应用如何。然后,我们展示一个电子健康记录数据纵向存储库初始分析的用例,该用例说明了框架的前四个步骤,并报告结果。
提出并描述了一个六步数据质量评估框架,包括以下数据质量评估步骤:(1)初步分析,(2)文档 - 纵向一致性,(3)广度,(4)数据元素存在情况,(5)密度,以及(6)预测。该六步框架应用于运输数据集市——一个数据存储库,其中包含超过28000条接受院间转运患者的记录,包括来自转出医院、转运过程以及转入医院的电子病历。
总共有9557条日志条目,其中8139条成功匹配到相应的医院就诊记录。2832条成功映射到转出和转入医院的就诊记录(自动匹配率为93%),其中590条包括空中医疗转运电子病历数据,可作为完整案例进行测试。步骤2的结果表明,一旦识别并匹配记录,当匹配标准提高时,额外记录的减少似乎相对有限,这表明一定比例的记录始终包含几乎完整的数据。步骤3和4中使用的集中趋势度量呈现右偏态,表明一小部分记录包含所测量变量的重复测量次数最多。
所提出的六步数据质量评估框架有助于为纵向数据存储库建立元数据,其他研究可对其进行复制。存在一些实际问题需要解决,包括数据质量评估——其中最具先见之明的是需要通过测试和应用建立数据质量指标,以基准化电子病历数据可接受的包容水平。