Dziadkowiec Oliwier, Callahan Tiffany, Ozkaynak Mustafa, Reeder Blaine, Welton John
University of Colorado, College of Nursing, Anschutz Medical Campus.
University of Colorado, Department of Pediatrics, Anschutz Medical Campus.
EGEMS (Wash DC). 2016 Jun 24;4(1):1201. doi: 10.13063/2327-9214.1201. eCollection 2016.
We examine the following: (1) the appropriateness of using a data quality (DQ) framework developed for relational databases as a data-cleaning tool for a data set extracted from two EPIC databases, and (2) the differences in statistical parameter estimates on a data set cleaned with the DQ framework and data set not cleaned with the DQ framework.
The use of data contained within electronic health records (EHRs) has the potential to open doors for a new wave of innovative research. Without adequate preparation of such large data sets for analysis, the results might be erroneous, which might affect clinical decision-making or the results of Comparative Effectives Research studies.
Two emergency department (ED) data sets extracted from EPIC databases (adult ED and children ED) were used as examples for examining the five concepts of DQ based on a DQ assessment framework designed for EHR databases. The first data set contained 70,061 visits; and the second data set contained 2,815,550 visits. SPSS Syntax examples as well as step-by-step instructions of how to apply the five key DQ concepts these EHR database extracts are provided.
SPSS Syntax to address each of the DQ concepts proposed by Kahn et al. (2012)1 was developed. The data set cleaned using Kahn's framework yielded more accurate results than the data set cleaned without this framework. Future plans involve creating functions in R language for cleaning data extracted from the EHR as well as an R package that combines DQ checks with missing data analysis functions.
我们研究以下内容:(1)将为关系数据库开发的数据质量(DQ)框架用作从两个EPIC数据库提取的数据集的数据清理工具是否合适,以及(2)使用DQ框架清理的数据集与未使用DQ框架清理的数据集在统计参数估计上的差异。
使用电子健康记录(EHR)中包含的数据有可能为新一轮创新研究打开大门。如果没有对如此大的数据集进行充分准备以进行分析,结果可能会出错,这可能会影响临床决策或比较效果研究的结果。
从EPIC数据库提取的两个急诊科(ED)数据集(成人ED和儿童ED)用作示例,基于为EHR数据库设计的DQ评估框架来检验DQ的五个概念。第一个数据集包含70,061次就诊记录;第二个数据集包含2,815,550次就诊记录。提供了SPSS语法示例以及如何将这五个关键DQ概念应用于这些EHR数据库提取物的逐步说明。
开发了用于处理Kahn等人(2012年)[1]提出的每个DQ概念的SPSS语法。使用Kahn框架清理的数据集比未使用此框架清理的数据集产生更准确的结果。未来计划包括用R语言创建用于清理从EHR提取的数据的函数,以及一个将DQ检查与缺失数据分析函数结合起来的R包。