Suppr超能文献

药物安全监测系统的数据质量管理。

Managing data quality for a drug safety surveillance system.

机构信息

College of Pharmacy, University of Florida, Gainesville, FL, USA,

出版信息

Drug Saf. 2013 Oct;36 Suppl 1:S49-58. doi: 10.1007/s40264-013-0098-7.

Abstract

OBJECTIVE

The objective of this study is to present a data quality assurance program for disparate data sources loaded into a Common Data Model, highlight data quality issues identified and resolutions implemented.

BACKGROUND

The Observational Medical Outcomes Partnership is conducting methodological research to develop a system to monitor drug safety. Standard processes and tools are needed to ensure continuous data quality across a network of disparate databases, and to ensure that procedures used to extract-transform-load (ETL) processes maintain data integrity. Currently, there is no consensus or standard approach to evaluate the quality of the source data, or ETL procedures.

METHODS

We propose a framework for a comprehensive process to ensure data quality throughout the steps used to process and analyze the data. The approach used to manage data anomalies includes: (1) characterization of data sources; (2) detection of data anomalies; (3) determining the cause of data anomalies; and (4) remediation.

FINDINGS

Data anomalies included incomplete raw dataset: no race or year of birth recorded. Implausible data: year of birth exceeding current year, observation period end date precedes start date, suspicious data frequencies and proportions outside normal range. Examples of errors found in the ETL process were zip codes incorrectly loaded, drug quantities rounded, drug exposure length incorrectly calculated, and condition length incorrectly programmed.

CONCLUSIONS

Complete and reliable observational data are difficult to obtain, data quality assurance processes need to be continuous as data is regularly updated; consequently, processes to assess data quality should be ongoing and transparent.

摘要

目的

本研究旨在介绍一种用于加载到通用数据模型中的异类数据源的数据质量保证程序,重点介绍已识别的数据质量问题和实施的解决方案。

背景

观察性医学结局伙伴关系正在进行方法学研究,以开发一种监测药物安全性的系统。需要标准流程和工具来确保在异类数据库网络中持续的数据质量,并确保用于提取-转换-加载(ETL)流程的数据完整性。目前,尚无评估源数据质量或 ETL 流程的共识或标准方法。

方法

我们提出了一个全面的流程框架,以确保在处理和分析数据所使用的步骤中数据质量。用于管理数据异常的方法包括:(1)数据源的特征描述;(2)数据异常的检测;(3)确定数据异常的原因;和(4)修复。

结果

数据异常包括:原始数据集不完整:未记录种族或出生日期。不合理的数据:出生日期超过当前年份,观察期结束日期早于开始日期,可疑数据频率和比例超出正常范围。在 ETL 过程中发现的错误示例包括:邮政编码加载不正确、药物数量四舍五入、药物暴露长度计算不正确以及条件长度编程不正确。

结论

完整且可靠的观察数据难以获取,数据质量保证流程需要持续进行,因为数据会定期更新;因此,评估数据质量的流程应持续且透明。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验