*Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA 02215, USA.
Med Care. 2013 Aug;51(8 Suppl 3):S22-9. doi: 10.1097/MLR.0b013e31829b1e2c.
Electronic health information routinely collected during health care delivery and reimbursement can help address the need for evidence about the real-world effectiveness, safety, and quality of medical care. Often, distributed networks that combine information from multiple sources are needed to generate this real-world evidence.
We provide a set of field-tested best practices and a set of recommendations for data quality checking for comparative effectiveness research (CER) in distributed data networks.
Explore the requirements for data quality checking and describe data quality approaches undertaken by several existing multi-site networks.
There are no established standards regarding how to evaluate the quality of electronic health data for CER within distributed networks. Data checks of increasing complexity are often used, ranging from consistency with syntactic rules to evaluation of semantics and consistency within and across sites. Temporal trends within and across sites are widely used, as are checks of each data refresh or update. Rates of specific events and exposures by age group, sex, and month are also common.
Secondary use of electronic health data for CER holds promise but is complex, especially in distributed data networks that incorporate periodic data refreshes. The viability of a learning health system is dependent on a robust understanding of the quality, validity, and optimal secondary uses of routinely collected electronic health data within distributed health data networks. Robust data quality checking can strengthen confidence in findings based on distributed data network.
在医疗保健提供和报销过程中常规收集的电子健康信息有助于满足对医疗保健实际效果、安全性和质量的证据的需求。通常,需要组合来自多个来源的信息的分布式网络来生成这种真实世界的证据。
我们为分布式数据网络中的比较疗效研究 (CER) 提供了一套经过现场测试的最佳实践和数据质量检查建议。
探讨数据质量检查的要求,并描述几个现有多站点网络所采用的数据质量方法。
关于如何在分布式网络中评估电子健康数据用于 CER 的质量,尚无既定标准。通常使用越来越复杂的数据检查,从与语法规则的一致性到语义评估以及站点内和站点间的一致性。站点内和站点间的时间趋势被广泛使用,每次数据刷新或更新的检查也是如此。特定事件和暴露的特定年龄组、性别和月份的发生率也很常见。
电子健康数据的二次使用对于 CER 具有很大的潜力,但非常复杂,尤其是在包含定期数据刷新的分布式数据网络中。学习型卫生系统的可行性取决于对分布式卫生数据网络中常规收集的电子健康数据的质量、有效性和最佳二次使用的深入了解。强大的数据质量检查可以增强对基于分布式数据网络的发现的信心。