Johnson Steven G, Speedie Stuart, Simon Gyorgy, Kumar Vipin, Westra Bonnie L
University of Minnesota, Institute for Health Informatics.
University of Minnesota, Department of Computer Science.
AMIA Annu Symp Proc. 2015 Nov 5;2015:1937-46. eCollection 2015.
The secondary use of EHR data for research is expected to improve health outcomes for patients, but the benefits will only be realized if the data in the EHR is of sufficient quality to support these uses. A data quality (DQ) ontology was developed to rigorously define concepts and enable automated computation of data quality measures. The healthcare data quality literature was mined for the important terms used to describe data quality concepts and harmonized into an ontology. Four high-level data quality dimensions ("correctness", "consistency", "completeness" and "currency") categorize 19 lower level measures. The ontology serves as an unambiguous vocabulary, which defines concepts more precisely than natural language; it provides a mechanism to automatically compute data quality measures; and is reusable across domains and use cases. A detailed example is presented to demonstrate its utility. The DQ ontology can make data validation more common and reproducible.
电子健康记录(EHR)数据用于研究的二次利用有望改善患者的健康状况,但只有当EHR中的数据质量足以支持这些用途时,才能实现这些益处。开发了一种数据质量(DQ)本体,以严格定义概念并实现数据质量度量的自动计算。对医疗保健数据质量文献进行挖掘,找出用于描述数据质量概念的重要术语,并将其统一到一个本体中。四个高级数据质量维度(“正确性”、“一致性”、“完整性”和“时效性”)对19个较低级别的度量进行了分类。该本体作为一种明确无误的词汇表,比自然语言更精确地定义概念;它提供了一种自动计算数据质量度量的机制;并且可跨领域和用例重复使用。给出了一个详细的例子来说明其效用。DQ本体可以使数据验证更加普遍和可重复。