Department of Community Medicine, Information and Health Decision Sciences (MEDCIDS), Faculty of Medicine, University of Porto, Porto, Portugal
Centre for Health Technology and Services Research (CINTESIS), Faculty of Medicine, University of Porto, Porto, Portugal.
BMJ Open. 2021 Dec 6;11(12):e047623. doi: 10.1136/bmjopen-2020-047623.
High-quality data are crucial for guiding decision-making and practising evidence-based healthcare, especially if previous knowledge is lacking. Nevertheless, data quality frailties have been exposed worldwide during the current COVID-19 pandemic. Focusing on a major Portuguese epidemiological surveillance dataset, our study aims to assess COVID-19 data quality issues and suggest possible solutions.
On 27 April 2020, the Portuguese Directorate-General of Health (DGS) made available a dataset (DGSApril) for researchers, upon request. On 4 August, an updated dataset (DGSAugust) was also obtained.
All COVID-19-confirmed cases notified through the medical component of National System for Epidemiological Surveillance until end of June.
Data completeness and consistency.
DGSAugust has not followed the data format and variables as DGSApril and a significant number of missing data and inconsistencies were found (eg, 4075 cases from the DGSApril were apparently not included in DGSAugust). Several variables also showed a low degree of completeness and/or changed their values from one dataset to another (eg, the variable 'underlying conditions' had more than half of cases showing different information between datasets). There were also significant inconsistencies between the number of cases and deaths due to COVID-19 shown in DGSAugust and by the DGS reports publicly provided daily.
Important quality issues of the Portuguese COVID-19 surveillance datasets were described. These issues can limit surveillance data usability to inform good decisions and perform useful research. Major improvements in surveillance datasets are therefore urgently needed-for example, simplification of data entry processes, constant monitoring of data, and increased training and awareness of healthcare providers-as low data quality may lead to a deficient pandemic control.
高质量的数据对于指导决策和实施循证医疗至关重要,尤其是在缺乏先前知识的情况下。然而,在当前的 COVID-19 大流行期间,世界各地都暴露出数据质量的脆弱性。本研究聚焦于葡萄牙一个主要的流行病学监测数据集,旨在评估 COVID-19 数据质量问题,并提出可能的解决方案。
2020 年 4 月 27 日,葡萄牙卫生总局(DGS)根据请求向研究人员提供了一个数据集(DGSApril)。2020 年 8 月 4 日,还获得了一个更新的数据集(DGSAugust)。
截至 6 月底,通过国家流行病学监测系统医疗部分报告的所有 COVID-19 确诊病例。
数据完整性和一致性。
DGSAugust 没有遵循 DGSApril 的数据格式和变量,并且发现了大量缺失数据和不一致(例如,DGSApril 中的 4075 例病例显然未包含在 DGSAugust 中)。几个变量的完整性也较低,或者从一个数据集到另一个数据集的值发生了变化(例如,“潜在条件”变量显示两个数据集之间有一半以上的病例信息不同)。DGSAugust 中报告的 COVID-19 病例数和死亡人数与 DGS 每日公开提供的报告之间也存在显著不一致。
描述了葡萄牙 COVID-19 监测数据集的重要质量问题。这些问题可能会限制监测数据的可用性,从而无法做出正确决策和进行有用的研究。因此,迫切需要对监测数据集进行重大改进,例如简化数据输入流程、对数据进行持续监测、增加对医疗保健提供者的培训和意识,因为低数据质量可能导致大流行控制不力。