Suppr超能文献

提高临床研究信息学工具中的数据质量

Improving Data Quality in Clinical Research Informatics Tools.

作者信息

AbuHalimeh Ahmed

机构信息

Information Science Department, University of Arkansas at Little Rock, Little Rock, AR, United States.

出版信息

Front Big Data. 2022 Apr 29;5:871897. doi: 10.3389/fdata.2022.871897. eCollection 2022.

Abstract

Maintaining data quality is a fundamental requirement for any successful and long-term data management. Providing high-quality, reliable, and statistically sound data is a primary goal for clinical research informatics. In addition, effective data governance and management are essential to ensuring accurate data counts, reports, and validation. As a crucial step of the clinical research process, it is important to establish and maintain organization-wide standards for data quality management to ensure consistency across all systems designed primarily for cohort identification, allowing users to perform an enterprise-wide search on a clinical research data repository to determine the existence of a set of patients meeting certain inclusion or exclusion criteria. Some of the clinical research tools are referred to as de-identified data tools. Assessing and improving the quality of data used by clinical research informatics tools are both important and difficult tasks. For an increasing number of users who rely on information as one of their most important assets, enforcing high data quality levels represents a strategic investment to preserve the value of the data. In clinical research informatics, better data quality translates into better research results and better patient care. However, achieving high-quality data standards is a major task because of the variety of ways that errors might be introduced in a system and the difficulty of correcting them systematically. Problems with data quality tend to fall into two categories. The first category is related to inconsistency among data resources such as format, syntax, and semantic inconsistencies. The second category is related to poor ETL and data mapping processes. In this paper, we describe a real-life case study on assessing and improving the data quality at one of healthcare organizations. This paper compares between the results obtained from two de-identified data systems i2b2, and Epic Slicedicer, and discuss the data quality dimensions' specific to the clinical research informatics context, and the possible data quality issues between the de-identified systems. This work in paper aims to propose steps/rules for maintaining the data quality among different systems to help data managers, information systems teams, and informaticists at any health care organization to monitor and sustain data quality as part of their business intelligence, data governance, and data democratization processes.

摘要

维护数据质量是任何成功且长期的数据管理的基本要求。提供高质量、可靠且统计合理的数据是临床研究信息学的主要目标。此外,有效的数据治理和管理对于确保准确的数据计数、报告和验证至关重要。作为临床研究过程的关键步骤,建立并维护全组织范围的数据质量管理标准很重要,以确保主要用于队列识别的所有系统之间的一致性,允许用户在临床研究数据存储库上进行全企业范围的搜索,以确定是否存在符合某些纳入或排除标准的一组患者。一些临床研究工具被称为去标识化数据工具。评估和提高临床研究信息学工具所使用数据的质量既是重要任务也是困难任务。对于越来越多将信息视为其最重要资产之一的用户而言,实施高数据质量水平是一项保护数据价值的战略投资。在临床研究信息学中,更好的数据质量转化为更好的研究结果和更好的患者护理。然而,实现高质量数据标准是一项重大任务,因为系统中可能引入错误的方式多种多样,且系统地纠正这些错误存在困难。数据质量问题往往分为两类。第一类与数据资源之间的不一致有关,如格式、语法和语义不一致。第二类与不良的ETL和数据映射过程有关。在本文中,我们描述了一个关于评估和提高一家医疗保健机构数据质量的实际案例研究。本文比较了从两个去标识化数据系统i2b2和Epic Slicedicer获得的结果,并讨论了临床研究信息学背景下特定的数据质量维度,以及去标识化系统之间可能存在的数据质量问题。本文的工作旨在提出在不同系统之间维护数据质量的步骤/规则,以帮助任何医疗保健机构的数据管理人员、信息系统团队和信息学家将监测和维持数据质量作为其商业智能、数据治理和数据民主化过程的一部分。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b20/9102971/0d2be9c63cf9/fdata-05-871897-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验