Suppr超能文献

面向数据质量评估的内容不可知可计算知识库。

Towards a content agnostic computable knowledge repository for data quality assessment.

机构信息

Department of Biomedical Informatics, Center for Clinical and Translational Sciences (CCTS) Biomedical Informatics Core, University of Utah, 421 Wakara Way, Suite 140, Salt Lake City, UT 84108-3514, USA.

出版信息

Comput Methods Programs Biomed. 2019 Aug;177:193-201. doi: 10.1016/j.cmpb.2019.05.017. Epub 2019 May 24.

Abstract

BACKGROUND AND OBJECTIVE

In recent years, several data quality conceptual frameworks have been proposed across the Data Quality and Information Quality domains towards assessment of quality of data. These frameworks are diverse, varying from simple lists of concepts to complex ontological and taxonomical representations of data quality concepts. The goal of this study is to design, develop and implement a platform agnostic computable data quality knowledge repository for data quality assessments.

METHODS

We identified computable data quality concepts by performing a comprehensive literature review of articles indexed in three major bibliographic data sources. From this corpus, we extracted data quality concepts, their definitions, applicable measures, their computability and identified conceptual relationships. We used these relationships to design and develop a data quality meta-model and implemented it in a quality knowledge repository.

RESULTS

We identified three primitives for programmatically performing data quality assessments: data quality concept, its definition, its measure or rule for data quality assessment, and their associations. We modeled a computable data quality meta-data repository and extended this framework to adapt, store, retrieve and automate assessment of other existing data quality assessment models.

CONCLUSION

We identified research gaps in data quality literature towards automating data quality assessments methods. In this process, we designed, developed and implemented a computable data quality knowledge repository for assessing quality and characterizing data in health data repositories. We leverage this knowledge repository in a service-oriented architecture to perform scalable and reproducible framework for data quality assessments in disparate biomedical data sources.

摘要

背景与目的

近年来,数据质量和信息质量领域提出了多个数据质量概念框架,用于评估数据质量。这些框架多种多样,从简单的概念列表到数据质量概念的复杂本体论和分类学表示形式都有。本研究的目的是设计、开发和实现一个与平台无关的可计算数据质量知识库,用于数据质量评估。

方法

我们通过对三个主要文献数据源中索引的文章进行全面的文献回顾,确定了可计算的数据质量概念。从这个语料库中,我们提取了数据质量概念、它们的定义、适用的度量标准、它们的可计算性和识别的概念关系。我们使用这些关系来设计和开发数据质量元模型,并将其实现到质量知识库中。

结果

我们确定了三个用于进行数据质量评估的编程原语:数据质量概念、其定义、其数据质量评估的度量或规则,以及它们的关联。我们对可计算的数据质量元数据知识库进行建模,并扩展了这个框架,以适应、存储、检索和自动化评估其他现有的数据质量评估模型。

结论

我们确定了数据质量文献中自动化数据质量评估方法的研究空白。在这个过程中,我们设计、开发和实现了一个可计算的数据质量知识库,用于评估健康数据存储库中的数据质量和特征。我们在面向服务的架构中利用这个知识库来执行可扩展和可重复的数据质量评估框架,用于不同的生物医学数据源。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验