Hoeven Loan R van, Bruijne Martine C de, Kemper Peter F, Koopman Maria M W, Rondeel Jan M M, Leyte Anja, Koffijberg Hendrik, Janssen Mart P, Roes Kit C B
Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Universiteitsweg 100, 3508, GA, Utrecht, The Netherlands.
Transfusion Technology Assessment Department, Sanquin Research, Plesmanlaan 125, 1066, CX, Amsterdam, The Netherlands.
BMC Med Inform Decis Mak. 2017 Jul 14;17(1):107. doi: 10.1186/s12911-017-0504-7.
Although data from electronic health records (EHR) are often used for research purposes, systematic validation of these data prior to their use is not standard practice. Existing validation frameworks discuss validity concepts without translating these into practical implementation steps or addressing the potential influence of linking multiple sources. Therefore we developed a practical approach for validating routinely collected data from multiple sources and to apply it to a blood transfusion data warehouse to evaluate the usability in practice.
The approach consists of identifying existing validation frameworks for EHR data or linked data, selecting validity concepts from these frameworks and establishing quantifiable validity outcomes for each concept. The approach distinguishes external validation concepts (e.g. concordance with external reports, previous literature and expert feedback) and internal consistency concepts which use expected associations within the dataset itself (e.g. completeness, uniformity and plausibility). In an example case, the selected concepts were applied to a transfusion dataset and specified in more detail.
Application of the approach to a transfusion dataset resulted in a structured overview of data validity aspects. This allowed improvement of these aspects through further processing of the data and in some cases adjustment of the data extraction. For example, the proportion of transfused products that could not be linked to the corresponding issued products initially was 2.2% but could be improved by adjusting data extraction criteria to 0.17%.
This stepwise approach for validating linked multisource data provides a basis for evaluating data quality and enhancing interpretation. When the process of data validation is adopted more broadly, this contributes to increased transparency and greater reliability of research based on routinely collected electronic health records.
尽管电子健康记录(EHR)数据常被用于研究目的,但在使用这些数据之前进行系统验证并非标准做法。现有的验证框架讨论了有效性概念,但未将其转化为实际实施步骤,也未解决链接多个数据源的潜在影响。因此,我们开发了一种实用方法,用于验证从多个来源常规收集的数据,并将其应用于输血数据仓库,以评估其在实际中的可用性。
该方法包括识别现有的EHR数据或链接数据的验证框架,从这些框架中选择有效性概念,并为每个概念建立可量化的有效性结果。该方法区分外部验证概念(例如与外部报告、先前文献和专家反馈的一致性)和内部一致性概念,后者使用数据集本身内部的预期关联(例如完整性、一致性和合理性)。在一个示例案例中,将选定的概念应用于输血数据集并进行了更详细的说明。
将该方法应用于输血数据集,得到了数据有效性方面的结构化概述。这使得通过进一步处理数据并在某些情况下调整数据提取来改进这些方面成为可能。例如,最初无法与相应发放产品关联的输血产品比例为2.2%,但通过调整数据提取标准可将其提高到0.17%。
这种验证链接多源数据的逐步方法为评估数据质量和加强解释提供了基础。当更广泛地采用数据验证过程时,这有助于提高基于常规收集的电子健康记录的研究的透明度和可靠性。