Suppr超能文献

从 DHIS2 国家报告系统中提取的艾滋病毒指标数据的数据清理流程:肯尼亚案例研究。

Data cleaning process for HIV-indicator data extracted from DHIS2 national reporting system: a case study of Kenya.

机构信息

Department of Information Science and Media Studies, University of Bergen, Bergen, Norway.

Institute of Biomedical Informatics, Moi University, Eldoret, Kenya.

出版信息

BMC Med Inform Decis Mak. 2020 Nov 13;20(1):293. doi: 10.1186/s12911-020-01315-7.

Abstract

BACKGROUND

The District Health Information Software-2 (DHIS2) is widely used by countries for national-level aggregate reporting of health-data. To best leverage DHIS2 data for decision-making, countries need to ensure that data within their systems are of the highest quality. Comprehensive, systematic, and transparent data cleaning approaches form a core component of preparing DHIS2 data for analyses. Unfortunately, there is paucity of exhaustive and systematic descriptions of data cleaning processes employed on DHIS2-based data. The aim of this study was to report on methods and results of a systematic and replicable data cleaning approach applied on HIV-data gathered within DHIS2 from 2011 to 2018 in Kenya, for secondary analyses.

METHODS

Six programmatic area reports containing HIV-indicators were extracted from DHIS2 for all care facilities in all counties in Kenya from 2011 to 2018. Data variables extracted included reporting rate, reporting timeliness, and HIV-indicator data elements per facility per year. 93,179 facility-records from 11,446 health facilities were extracted from year 2011 to 2018. Van den Broeck et al.'s framework, involving repeated cycles of a three-phase process (data screening, data diagnosis and data treatment), was employed semi-automatically within a generic five-step data-cleaning sequence, which was developed and applied in cleaning the extracted data. Various quality issues were identified, and Friedman analysis of variance conducted to examine differences in distribution of records with selected issues across eight years.

RESULTS

Facility-records with no data accounted for 50.23% and were removed. Of the remaining, 0.03% had over 100% in reporting rates. Of facility-records with reporting data, 0.66% and 0.46% were retained for voluntary medical male circumcision and blood safety programmatic area reports respectively, given that few facilities submitted data or offered these services. Distribution of facility-records with selected quality issues varied significantly by programmatic area (p < 0.001). The final clean dataset obtained was suitable to be used for subsequent secondary analyses.

CONCLUSIONS

Comprehensive, systematic, and transparent reporting of cleaning-process is important for validity of the research studies as well as data utilization. The semi-automatic procedures used resulted in improved data quality for use in secondary analyses, which could not be secured by automated procedures solemnly.

摘要

背景

地区卫生信息系统-2(DHIS2)被许多国家广泛用于国家级卫生数据的汇总报告。为了使 DHIS2 数据在决策中得到最佳利用,各国需要确保其系统内的数据质量最高。全面、系统和透明的数据清理方法是为分析准备 DHIS2 数据的核心组成部分。不幸的是,对于在基于 DHIS2 的数据上使用的数据清理过程,缺乏详尽和系统的描述。本研究旨在报告在肯尼亚 2011 年至 2018 年期间,在 DHIS2 中收集的 HIV 数据中应用系统和可重复的数据清理方法的方法和结果,以便进行二次分析。

方法

从 2011 年至 2018 年,从肯尼亚所有县的所有护理机构中从 DHIS2 中提取了包含 HIV 指标的六个方案领域报告。提取的数据变量包括报告率、报告及时性以及每个设施每年的 HIV 指标数据元素。从 2011 年到 2018 年,从 11446 个卫生机构中提取了 93179 个机构记录。Van den Broeck 等人的框架涉及一个三阶段过程(数据筛选、数据诊断和数据处理)的重复循环,在一个通用的五步数据清理序列中半自动应用,该序列是开发并应用于清理提取的数据。确定了各种质量问题,并进行了 Friedman 方差分析,以检查八年中具有选定问题的记录分布差异。

结果

无数据的机构记录占 50.23%,被删除。在其余记录中,有 0.03%的记录报告率超过 100%。在有报告数据的机构记录中,自愿男性包皮环切和血液安全方案领域报告分别保留了 0.66%和 0.46%的记录,因为很少有机构提交数据或提供这些服务。具有选定质量问题的机构记录的分布在方案领域之间存在显著差异(p < 0.001)。获得的最终干净数据集适合用于随后的二次分析。

结论

全面、系统和透明的数据清理过程报告对于研究的有效性以及数据的利用都很重要。使用半自动程序可提高数据质量,从而可用于二次分析,而自动化程序则无法保证这一点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cbe6/7664027/c27e8407759d/12911_2020_1315_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验