大型儿科研究网络数据质量评估工作流程的设计与优化

Design and Refinement of a Data Quality Assessment Workflow for a Large Pediatric Research Network.

作者信息

Khare Ritu, Utidjian Levon H, Razzaghi Hanieh, Soucek Victoria, Burrows Evanette, Eckrich Daniel, Hoyt Richard, Weinstein Harris, Miller Matthew W, Soler David, Tucker Joshua, Bailey L Charles

机构信息

The Children's Hospital of Philadelphia, US.

Seattle Children's Hospital, US.

出版信息

EGEMS (Wash DC). 2019 Aug 1;7(1):36. doi: 10.5334/egems.294.

DOI:10.5334/egems.294

PMID:31531382

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6676917/

Abstract

BACKGROUND

Clinical data research networks (CDRNs) aggregate electronic health record data from multiple hospitals to enable large-scale research. A critical operation toward building a CDRN is conducting continual evaluations to optimize data quality. The key challenges include determining the assessment coverage on big datasets, handling data variability over time, and facilitating communication with data teams. This study presents the evolution of a systematic workflow for data quality assessment in CDRNs.

IMPLEMENTATION

Using a specific CDRN as use case, the workflow was iteratively developed and packaged into a toolkit. The resultant toolkit comprises 685 data quality checks to identify any data quality issues, procedures to reconciliate with a history of known issues, and a contemporary GitHub-based reporting mechanism for organized tracking.

RESULTS

During the first two years of network development, the toolkit assisted in discovering over 800 data characteristics and resolving over 1400 programming errors. Longitudinal analysis indicated that the variability in time to resolution (15day mean, 24day IQR) is due to the underlying cause of the issue, perceived importance of the domain, and the complexity of assessment.

CONCLUSIONS

In the absence of a formalized data quality framework, CDRNs continue to face challenges in data management and query fulfillment. The proposed data quality toolkit was empirically validated on a particular network, and is publicly available for other networks. While the toolkit is user-friendly and effective, the usage statistics indicated that the data quality process is very time-intensive and sufficient resources should be dedicated for investigating problems and optimizing data for research.

摘要

背景

临床数据研究网络（CDRN）聚合多家医院的电子健康记录数据，以开展大规模研究。构建CDRN的一项关键操作是进行持续评估，以优化数据质量。主要挑战包括确定对大数据集的评估范围、处理数据随时间的变异性以及促进与数据团队的沟通。本研究介绍了CDRN中数据质量评估系统工作流程的演变。

实施

以一个特定的CDRN为例，该工作流程经过迭代开发并打包成一个工具包。最终的工具包包含685项数据质量检查，以识别任何数据质量问题、与已知问题历史进行协调的程序，以及基于GitHub的当代报告机制，用于有组织的跟踪。

结果

在网络开发的头两年中，该工具包协助发现了800多个数据特征，并解决了1400多个编程错误。纵向分析表明，解决时间的变异性（平均15天，四分位距24天）是由问题的根本原因、领域的感知重要性以及评估的复杂性造成的。

结论

在缺乏正式数据质量框架的情况下，CDRN在数据管理和查询实现方面继续面临挑战。所提出的数据质量工具包在一个特定网络上得到了实证验证，并向其他网络公开提供。虽然该工具包用户友好且有效，但使用统计表明，数据质量过程非常耗时，应投入足够的资源来调查问题并优化用于研究的数据。

相似文献

Design and Refinement of a Data Quality Assessment Workflow for a Large Pediatric Research Network.大型儿科研究网络数据质量评估工作流程的设计与优化

EGEMS (Wash DC). 2019 Aug 1;7(1):36. doi: 10.5334/egems.294.

A longitudinal analysis of data quality in a large pediatric data research network.大型儿科数据研究网络中数据质量的纵向分析。

J Am Med Inform Assoc. 2017 Nov 1;24(6):1072-1079. doi: 10.1093/jamia/ocx033.

The future of Cochrane Neonatal.考克兰新生儿协作网的未来。

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

Predicting Causes of Data Quality Issues in a Clinical Data Research Network.预测临床数据研究网络中数据质量问题的原因。

AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:113-121. eCollection 2018.

Evaluating the Usability and Perceived Impact of an Electronic Medical Record Toolkit for Atrial Fibrillation Management in Primary Care: A Mixed-Methods Study Incorporating Human Factors Design.评估用于基层医疗中房颤管理的电子病历工具包的可用性和感知影响：一项纳入人因设计的混合方法研究

JMIR Hum Factors. 2016 Feb 17;3(1):e7. doi: 10.2196/humanfactors.4289.

Systematic data quality assessment of electronic health record data to evaluate study-specific fitness: Report from the PRESERVE research study.电子健康记录数据的系统数据质量评估以评估特定研究适用性：PRESERVE研究报告

PLOS Digit Health. 2024 Jun 27;3(6):e0000527. doi: 10.1371/journal.pdig.0000527. eCollection 2024 Jun.

The IeDEA harmonist data toolkit: A data quality and data sharing solution for a global HIV research consortium.IeDEA 协调员数据工具包：一个针对全球 HIV 研究联盟的数据质量和数据共享解决方案。

J Biomed Inform. 2022 Jul;131:104110. doi: 10.1016/j.jbi.2022.104110. Epub 2022 Jun 6.

请你提供一下具体的原文内容，以便我进行翻译。

Usability testing of Avoiding Diabetes Thru Action Plan Targeting (ADAPT) decision support for integrating care-based counseling of pre-diabetes in an electronic health record.通过行动计划目标预防糖尿病（ADAPT）决策支持系统在电子健康记录中整合基于护理的糖尿病前期咨询的可用性测试。

Int J Med Inform. 2014 Sep;83(9):636-47. doi: 10.1016/j.ijmedinf.2014.05.002. Epub 2014 May 23.

引用本文的文献

Data quality assessment in healthcare, dimensions, methods and tools: a systematic review.医疗保健中的数据质量评估：维度、方法与工具——一项系统综述

BMC Med Inform Decis Mak. 2025 Aug 9;25(1):296. doi: 10.1186/s12911-025-03136-y.

Data Missingness and Equity Implications in the Nation's Largest Student Fitness Surveillance System: The New York City School Based Physical Fitness Testing Programs, 2006-2020.美国最大的学生体能监测系统中的数据缺失情况及公平性影响：纽约市学校体育体能测试项目，2006 - 2020年

J Sch Health. 2025 Jul;95(7):498-509. doi: 10.1111/josh.70021. Epub 2025 May 19.

Continuous multimodal data supply chain and expandable clinical decision support for oncology.肿瘤学的连续多模态数据供应链及可扩展临床决策支持

NPJ Digit Med. 2025 Feb 27;8(1):128. doi: 10.1038/s41746-025-01508-2.

Clinical characteristics and favorable treatment responses of recurrent focal segmental glomerulosclerosis or steroid-resistant nephrotic syndrome in children after kidney transplantation.儿童肾移植后复发性局灶节段性肾小球硬化或激素抵抗性肾病综合征的临床特征及治疗反应良好。

Pediatr Nephrol. 2024 Nov;39(11):3317-3331. doi: 10.1007/s00467-024-06452-z. Epub 2024 Jul 13.

Automating Electronic Health Record Data Quality Assessment.自动化电子健康记录数据质量评估。

J Med Syst. 2023 Feb 13;47(1):23. doi: 10.1007/s10916-022-01892-2.

Landscape analysis for a neonatal disease progression model of bronchopulmonary dysplasia: Leveraging clinical trial experience and real-world data.支气管肺发育不良新生儿疾病进展模型的景观分析：利用临床试验经验和真实世界数据

Front Pharmacol. 2022 Oct 12;13:988974. doi: 10.3389/fphar.2022.988974. eCollection 2022.

Perceived Risk of Re-Identification in OMOP-CDM Database: A Cross-Sectional Survey.OMOP-CDM 数据库中重新识别风险的感知：一项横断面调查。

J Korean Med Sci. 2022 Jul 4;37(26):e205. doi: 10.3346/jkms.2022.37.e205.

Measuring BMI change among children and adolescents.测量儿童和青少年的 BMI 变化。

Pediatr Obes. 2022 Jun;17(6):e12889. doi: 10.1111/ijpo.12889. Epub 2022 Jan 22.

Interrelationships among age at adiposity rebound, BMI during childhood, and BMI after age 14 years in an electronic health record database.电子健康记录数据库中肥胖反弹年龄、儿童期 BMI 与 14 岁后 BMI 之间的相互关系。

Obesity (Silver Spring). 2022 Jan;30(1):201-208. doi: 10.1002/oby.23315.

A data driven learning approach for the assessment of data quality.一种基于数据驱动的学习方法，用于评估数据质量。

BMC Med Inform Decis Mak. 2021 Nov 1;21(1):302. doi: 10.1186/s12911-021-01656-x.

本文引用的文献

Predicting Causes of Data Quality Issues in a Clinical Data Research Network.预测临床数据研究网络中数据质量问题的原因。

AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:113-121. eCollection 2018.

Evaluating Foundational Data Quality in the National Patient-Centered Clinical Research Network (PCORnet®).评估国家以患者为中心的临床研究网络（PCORnet®）中的基础数据质量。

EGEMS (Wash DC). 2018 Apr 13;6(1):3. doi: 10.5334/egems.199.

A Comparison of Data Quality Assessment Checks in Six Data Sharing Networks.六个数据共享网络中数据质量评估检查的比较

EGEMS (Wash DC). 2017 Jun 12;5(1):8. doi: 10.5334/egems.223.

A longitudinal analysis of data quality in a large pediatric data research network.大型儿科数据研究网络中数据质量的纵向分析。

J Am Med Inform Assoc. 2017 Nov 1;24(6):1072-1079. doi: 10.1093/jamia/ocx033.

Multisite Evaluation of a Data Quality Tool for Patient-Level Clinical Data Sets.针对患者层面临床数据集的数据质量工具的多中心评估

EGEMS (Wash DC). 2016 Nov 30;4(1):1239. doi: 10.13063/2327-9214.1239. eCollection 2016.

A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data.电子健康记录数据二次使用的统一数据质量评估术语和框架。

EGEMS (Wash DC). 2016 Sep 11;4(1):1244. doi: 10.13063/2327-9214.1244. eCollection 2016.

PEDSnet: a National Pediatric Learning Health System.PEDSnet：国家儿科学习健康系统。

J Am Med Inform Assoc. 2014 Jul-Aug;21(4):602-6. doi: 10.1136/amiajnl-2014-002743. Epub 2014 May 12.

Electronic clinical laboratory test results data tables: lessons from Mini-Sentinel.电子临床检验结果数据表：来自Mini-Sentinel的经验教训

Pharmacoepidemiol Drug Saf. 2014 Jun;23(6):609-18. doi: 10.1002/pds.3580. Epub 2014 Feb 18.

A comprehensive framework for data quality assessment in CER.比较效果研究中数据质量评估的综合框架。

AMIA Jt Summits Transl Sci Proc. 2013 Mar 18;2013:86-8. eCollection 2013.

Multi-Institutional Sharing of Electronic Health Record Data to Assess Childhood Obesity.多机构共享电子健康记录数据以评估儿童肥胖症

PLoS One. 2013 Jun 18;8(6):e66192. doi: 10.1371/journal.pone.0066192. Print 2013.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

大型儿科研究网络数据质量评估工作流程的设计与优化

Design and Refinement of a Data Quality Assessment Workflow for a Large Pediatric Research Network.

作者信息

机构信息

出版信息

BACKGROUND

IMPLEMENTATION

RESULTS

CONCLUSIONS

背景

实施

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献