Suppr超能文献

定义实用的医疗数据库数据质量问题引用分类法。

Definition of a Practical Taxonomy for Referencing Data Quality Problems in Health Care Databases.

机构信息

Univ. Lille, CHU Lille, ULR 2694 - METRICS: Évaluation des Technologies de Santé et des Pratiques Médicales, Lille, France.

Department of Anesthesiology and Intensive Care Unit, Groupe Hospitalier de la Région de Mulhouse et Sud-Alsace, Mulhouse, France.

出版信息

Methods Inf Med. 2023 May;62(1-02):19-30. doi: 10.1055/a-1976-2371. Epub 2022 Nov 10.

Abstract

INTRODUCTION

Health care information systems can generate and/or record huge volumes of data, some of which may be reused for research, clinical trials, or teaching. However, these databases can be affected by data quality problems; hence, an important step in the data reuse process consists in detecting and rectifying these issues. With a view to facilitating the assessment of data quality, we developed a taxonomy of data quality problems in operational databases.

MATERIAL

We searched the literature for publications that mentioned "data quality problems," "data quality taxonomy," "data quality assessment," or "dirty data." The publications were then reviewed, compared, summarized, and structured using a bottom-up approach, to provide an operational taxonomy of data quality problems. The latter were illustrated with fictional examples (though based on reality) from clinical databases.

RESULTS

Twelve publications were selected, and 286 instances of data quality problems were identified and were classified according to six distinct levels of granularity. We used the classification defined by Oliveira et al to structure our taxonomy. The extracted items were grouped into 53 data quality problems.

DISCUSSION

This taxonomy facilitated the systematic assessment of data quality in databases by presenting the data's quality according to their granularity. The definition of this taxonomy is the first step in the data cleaning process. The subsequent steps include the definition of associated quality assessment methods and data cleaning methods.

CONCLUSION

Our new taxonomy enabled the classification and illustration of 53 data quality problems found in hospital databases.

摘要

简介

医疗保健信息系统可以生成和/或记录大量数据,其中一些数据可能会被重新用于研究、临床试验或教学。然而,这些数据库可能会受到数据质量问题的影响;因此,数据重用过程中的一个重要步骤包括检测和纠正这些问题。为了便于评估数据质量,我们开发了一种操作数据库中数据质量问题的分类法。

材料

我们搜索了文献中提到“数据质量问题”、“数据质量分类法”、“数据质量评估”或“脏数据”的出版物。然后,使用自下而上的方法对出版物进行审查、比较、总结和构建,以提供一种操作数据质量问题的分类法。后者使用来自临床数据库的虚构示例(尽管基于现实)来说明。

结果

选择了 12 篇出版物,确定了 286 个数据质量问题实例,并根据六个不同的粒度级别进行了分类。我们使用 Oliveira 等人定义的分类来构建我们的分类法。提取的项目被分为 53 个数据质量问题。

讨论

通过根据数据的粒度呈现数据的质量,该分类法有助于对数据库中的数据质量进行系统评估。该分类法的定义是数据清理过程的第一步。后续步骤包括定义相关的质量评估方法和数据清理方法。

结论

我们的新分类法能够对医院数据库中发现的 53 个数据质量问题进行分类和说明。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验