Suppr超能文献

大型儿科研究网络数据质量评估工作流程的设计与优化

Design and Refinement of a Data Quality Assessment Workflow for a Large Pediatric Research Network.

作者信息

Khare Ritu, Utidjian Levon H, Razzaghi Hanieh, Soucek Victoria, Burrows Evanette, Eckrich Daniel, Hoyt Richard, Weinstein Harris, Miller Matthew W, Soler David, Tucker Joshua, Bailey L Charles

机构信息

The Children's Hospital of Philadelphia, US.

Seattle Children's Hospital, US.

出版信息

EGEMS (Wash DC). 2019 Aug 1;7(1):36. doi: 10.5334/egems.294.

Abstract

BACKGROUND

Clinical data research networks (CDRNs) aggregate electronic health record data from multiple hospitals to enable large-scale research. A critical operation toward building a CDRN is conducting continual evaluations to optimize data quality. The key challenges include determining the assessment coverage on big datasets, handling data variability over time, and facilitating communication with data teams. This study presents the evolution of a systematic workflow for data quality assessment in CDRNs.

IMPLEMENTATION

Using a specific CDRN as use case, the workflow was iteratively developed and packaged into a toolkit. The resultant toolkit comprises 685 data quality checks to identify any data quality issues, procedures to reconciliate with a history of known issues, and a contemporary GitHub-based reporting mechanism for organized tracking.

RESULTS

During the first two years of network development, the toolkit assisted in discovering over 800 data characteristics and resolving over 1400 programming errors. Longitudinal analysis indicated that the variability in time to resolution (15day mean, 24day IQR) is due to the underlying cause of the issue, perceived importance of the domain, and the complexity of assessment.

CONCLUSIONS

In the absence of a formalized data quality framework, CDRNs continue to face challenges in data management and query fulfillment. The proposed data quality toolkit was empirically validated on a particular network, and is publicly available for other networks. While the toolkit is user-friendly and effective, the usage statistics indicated that the data quality process is very time-intensive and sufficient resources should be dedicated for investigating problems and optimizing data for research.

摘要

背景

临床数据研究网络(CDRN)聚合多家医院的电子健康记录数据,以开展大规模研究。构建CDRN的一项关键操作是进行持续评估,以优化数据质量。主要挑战包括确定对大数据集的评估范围、处理数据随时间的变异性以及促进与数据团队的沟通。本研究介绍了CDRN中数据质量评估系统工作流程的演变。

实施

以一个特定的CDRN为例,该工作流程经过迭代开发并打包成一个工具包。最终的工具包包含685项数据质量检查,以识别任何数据质量问题、与已知问题历史进行协调的程序,以及基于GitHub的当代报告机制,用于有组织的跟踪。

结果

在网络开发的头两年中,该工具包协助发现了800多个数据特征,并解决了1400多个编程错误。纵向分析表明,解决时间的变异性(平均15天,四分位距24天)是由问题的根本原因、领域的感知重要性以及评估的复杂性造成的。

结论

在缺乏正式数据质量框架的情况下,CDRN在数据管理和查询实现方面继续面临挑战。所提出的数据质量工具包在一个特定网络上得到了实证验证,并向其他网络公开提供。虽然该工具包用户友好且有效,但使用统计表明,数据质量过程非常耗时,应投入足够的资源来调查问题并优化用于研究的数据。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验