• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

处理 COVID-19 发病率估计中的缺失数据:二次数据分析。

Handling Missing Data in COVID-19 Incidence Estimation: Secondary Data Analysis.

机构信息

School of Preventive Medicine and Public Health, Hanoi Medical University, 1 Ton That Tung Street, Kim Lien Ward, Dong Da District, Hanoi, 100000, Vietnam, 84 368-577-4236.

UMass Chan Medical School, University of Massachusetts Medical School, Worcester, MA, United States.

出版信息

JMIR Public Health Surveill. 2024 Aug 20;10:e53719. doi: 10.2196/53719.

DOI:10.2196/53719
PMID:39166439
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11350390/
Abstract

BACKGROUND

The COVID-19 pandemic has revealed significant challenges in disease forecasting and in developing a public health response, emphasizing the need to manage missing data from various sources in making accurate forecasts.

OBJECTIVE

We aimed to show how handling missing data can affect estimates of the COVID-19 incidence rate (CIR) in different pandemic situations.

METHODS

This study used data from the COVID-19/SARS-CoV-2 surveillance system at the National Institute of Hygiene and Epidemiology, Vietnam. We separated the available data set into 3 distinct periods: zero COVID-19, transition, and new normal. We randomly removed 5% to 30% of data that were missing completely at random, with a break of 5% at each time point in the variable daily caseload of COVID-19. We selected 7 analytical methods to assess the effects of handling missing data and calculated statistical and epidemiological indices to measure the effectiveness of each method.

RESULTS

Our study examined missing data imputation performance across 3 study time periods: zero COVID-19 (n=3149), transition (n=1290), and new normal (n=9288). Imputation analyses showed that K-nearest neighbor (KNN) had the lowest mean absolute percentage change (APC) in CIR across the range (5% to 30%) of missing data. For instance, with 15% missing data, KNN resulted in 10.6%, 10.6%, and 9.7% average bias across the zero COVID-19, transition, and new normal periods, compared to 39.9%, 51.9%, and 289.7% with the maximum likelihood method. The autoregressive integrated moving average model showed the greatest mean APC in the mean number of confirmed cases of COVID-19 during each COVID-19 containment cycle (CCC) when we imputed the missing data in the zero COVID-19 period, rising from 226.3% at the 5% missing level to 6955.7% at the 30% missing level. Imputing missing data with median imputation methods had the lowest bias in the average number of confirmed cases in each CCC at all levels of missing data. In detail, in the 20% missing scenario, while median imputation had an average bias of 16.3% for confirmed cases in each CCC, which was lower than the KNN figure, maximum likelihood imputation showed a bias on average of 92.4% for confirmed cases in each CCC, which was the highest figure. During the new normal period in the 25% and 30% missing data scenarios, KNN imputation had average biases for CIR and confirmed cases in each CCC ranging from 21% to 32% for both, while maximum likelihood and moving average imputation showed biases on average above 250% for both CIR and confirmed cases in each CCC.

CONCLUSIONS

Our study emphasizes the importance of understanding that the specific imputation method used by investigators should be tailored to the specific epidemiological context and data collection environment to ensure reliable estimates of the CIR.

摘要

背景

COVID-19 大流行揭示了在疾病预测和制定公共卫生应对措施方面的重大挑战,强调了在进行准确预测时需要处理来自各种来源的缺失数据。

目的

我们旨在展示在不同大流行情况下,处理缺失数据如何影响 COVID-19 发病率(CIR)的估计。

方法

本研究使用了越南国家卫生和流行病学研究所的 COVID-19/SARS-CoV-2 监测系统的数据。我们将可用数据集分为 3 个不同时期:零 COVID-19、过渡和新常态。我们随机删除了 5%至 30%的完全随机缺失数据,在 COVID-19 日病例数变量中每个时间点以 5%的间隔进行缺失。我们选择了 7 种分析方法来评估处理缺失数据的效果,并计算了统计和流行病学指标来衡量每种方法的效果。

结果

我们的研究在 3 个研究时间段(零 COVID-19、过渡和新常态)中检查了缺失数据插补表现:零 COVID-19(n=3149)、过渡(n=1290)和新常态(n=9288)。插补分析表明,K-最近邻(KNN)在缺失数据范围(5%至 30%)内对 CIR 的平均绝对百分比变化(APC)最低。例如,在缺失 15%的数据时,与最大似然法相比,KNN 在零 COVID-19、过渡和新常态期间的平均偏差分别为 10.6%、10.6%和 9.7%。而最大似然法的平均偏差分别为 39.9%、51.9%和 289.7%。在零 COVID-19 期间插补缺失数据时,自回归综合移动平均模型显示在每个 COVID-19 控制周期(CCC)中 COVID-19 确诊病例数的平均 APC 最高,从 5%缺失水平的 226.3%上升到 30%缺失水平的 6955.7%。在所有缺失数据水平下,中位数插补方法在每个 CCC 的确诊病例数的平均偏差最低。具体来说,在 20%缺失的情况下,虽然中位数插补对每个 CCC 的确诊病例的平均偏差为 16.3%,低于 KNN 的数字,但最大似然插补对每个 CCC 的确诊病例的平均偏差为 92.4%,这是最高的数字。在新常态时期,在 25%和 30%的缺失数据情况下,KNN 插补对 CIR 和每个 CCC 的确诊病例的平均偏差在 21%到 32%之间,而最大似然和移动平均插补对 CIR 和每个 CCC 的确诊病例的平均偏差均超过 250%。

结论

我们的研究强调了一个重要的认识,即研究人员使用的具体插补方法应根据特定的流行病学背景和数据收集环境进行调整,以确保 CIR 的可靠估计。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b1/11350390/5466a1f9ea8d/publichealth-v10-e53719-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b1/11350390/d4853591168d/publichealth-v10-e53719-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b1/11350390/8506c5ee7486/publichealth-v10-e53719-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b1/11350390/5466a1f9ea8d/publichealth-v10-e53719-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b1/11350390/d4853591168d/publichealth-v10-e53719-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b1/11350390/8506c5ee7486/publichealth-v10-e53719-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b1/11350390/5466a1f9ea8d/publichealth-v10-e53719-g003.jpg

相似文献

1
Handling Missing Data in COVID-19 Incidence Estimation: Secondary Data Analysis.处理 COVID-19 发病率估计中的缺失数据:二次数据分析。
JMIR Public Health Surveill. 2024 Aug 20;10:e53719. doi: 10.2196/53719.
2
Effects of missing data imputation methods on univariate blood pressure time series data analysis and forecasting with ARIMA and LSTM.缺失数据插补方法对使用ARIMA和LSTM进行单变量血压时间序列数据分析及预测的影响。
BMC Med Res Methodol. 2024 Dec 26;24(1):320. doi: 10.1186/s12874-024-02448-3.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
Addressing missing values in routine health information system data: an evaluation of imputation methods using data from the Democratic Republic of the Congo during the COVID-19 pandemic.解决常规卫生信息系统数据中的缺失值问题:使用刚果民主共和国在 COVID-19 大流行期间的数据评估插补方法。
Popul Health Metr. 2021 Nov 4;19(1):44. doi: 10.1186/s12963-021-00274-z.
5
A COVID-19 Pandemic Artificial Intelligence-Based System With Deep Learning Forecasting and Automatic Statistical Data Acquisition: Development and Implementation Study.一种基于人工智能的新冠肺炎大流行深度学习预测与自动统计数据采集系统:开发与实施研究
J Med Internet Res. 2021 May 20;23(5):e27806. doi: 10.2196/27806.
6
Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies.基于分布的最近邻插补法用于截断高维数据及其在临床前和临床代谢组学研究中的应用
BMC Bioinformatics. 2017 Feb 20;18(1):114. doi: 10.1186/s12859-017-1547-6.
7
Imputation strategies when a continuous outcome is to be dichotomized for responder analysis: a simulation study.当连续结果需要二分类化进行应答者分析时的推断策略:一项模拟研究。
BMC Med Res Methodol. 2019 Jul 23;19(1):161. doi: 10.1186/s12874-019-0793-x.
8
Advanced methods for missing values imputation based on similarity learning.基于相似性学习的缺失值插补先进方法。
PeerJ Comput Sci. 2021 Jul 21;7:e619. doi: 10.7717/peerj-cs.619. eCollection 2021.
9
Comparing methods for handling missing values in food-frequency questionnaires and proposing k nearest neighbours imputation: effects on dietary intake in the Norwegian Women and Cancer study (NOWAC).比较食物频率问卷中处理缺失值的方法并提出k近邻插补法:对挪威妇女与癌症研究(NOWAC)中饮食摄入量的影响
Public Health Nutr. 2008 Apr;11(4):361-70. doi: 10.1017/S1368980007000365. Epub 2007 Jul 2.
10
Antibody tests for identification of current and past infection with SARS-CoV-2.抗体检测用于鉴定 SARS-CoV-2 的现症感染和既往感染。
Cochrane Database Syst Rev. 2022 Nov 17;11(11):CD013652. doi: 10.1002/14651858.CD013652.pub2.

引用本文的文献

1
Comparing Multiple Imputation Methods to Address Missing Patient Demographics in Immunization Information Systems: Retrospective Cohort Study.比较多种多重填补方法以解决免疫接种信息系统中患者人口统计学数据缺失问题:回顾性队列研究。
JMIR Public Health Surveill. 2025 Aug 26;11:e73916. doi: 10.2196/73916.

本文引用的文献

1
Multi-Type Missing Imputation of Time-Series Power Equipment Monitoring Data Based on Moving Average Filter-Asymmetric Denoising Autoencoder.基于移动平均滤波器-非对称去噪自编码器的电力设备监测时间序列数据多类型缺失插补
Sensors (Basel). 2023 Dec 8;23(24):9697. doi: 10.3390/s23249697.
2
Shift from a Zero-COVID strategy to a New-normal strategy for controlling SARS-COV-2 infections in Vietnam.从“零新冠”战略转向“新常态”战略,以控制越南 SARS-CoV-2 感染。
Epidemiol Infect. 2023 Jul 4;151:e117. doi: 10.1017/S0950268823001048.
3
Handling missing values in healthcare data: A systematic review of deep learning-based imputation techniques.
处理医疗保健数据中的缺失值:基于深度学习的插补技术的系统评价。
Artif Intell Med. 2023 Aug;142:102587. doi: 10.1016/j.artmed.2023.102587. Epub 2023 May 22.
4
Modeling the Potential Impact of Missing Race and Ethnicity Data in Infectious Disease Surveillance Systems on Disparity Measures: Scenario Analysis of Different Imputation Strategies.建模缺失种族和民族数据对传染病监测系统中差异措施的潜在影响:不同插补策略的情景分析。
JMIR Public Health Surveill. 2022 Nov 9;8(11):e38037. doi: 10.2196/38037.
5
The Impact of Nonrandom Missingness in Surveillance Data for Population-Level Summaries: Simulation Study.监测数据中随机缺失对人群水平汇总的影响:模拟研究。
JMIR Public Health Surveill. 2022 Sep 9;8(9):e37887. doi: 10.2196/37887.
6
Addressing missing values in routine health information system data: an evaluation of imputation methods using data from the Democratic Republic of the Congo during the COVID-19 pandemic.解决常规卫生信息系统数据中的缺失值问题:使用刚果民主共和国在 COVID-19 大流行期间的数据评估插补方法。
Popul Health Metr. 2021 Nov 4;19(1):44. doi: 10.1186/s12963-021-00274-z.
7
An Artificial Neural Network-Based Pediatric Mortality Risk Score: Development and Performance Evaluation Using Data From a Large North American Registry.基于人工神经网络的儿科死亡率风险评分:使用来自北美大型登记处的数据进行开发和性能评估
JMIR Med Inform. 2021 Aug 31;9(8):e24079. doi: 10.2196/24079.
8
Estimating the cumulative incidence of COVID-19 in the United States using influenza surveillance, virologic testing, and mortality data: Four complementary approaches.利用流感监测、病毒学检测和死亡率数据估计美国 COVID-19 的累积发病率:四种互补方法。
PLoS Comput Biol. 2021 Jun 17;17(6):e1008994. doi: 10.1371/journal.pcbi.1008994. eCollection 2021 Jun.
9
Data Missing Not at Random in Mobile Health Research: Assessment of the Problem and a Case for Sensitivity Analyses.移动健康研究中的数据缺失非随机:问题评估与敏感性分析案例。
J Med Internet Res. 2021 Jun 15;23(6):e26749. doi: 10.2196/26749.
10
Missing-Data Handling Methods for Lifelogs-Based Wellness Index Estimation: Comparative Analysis With Panel Data.基于生活日志的健康指数估计中的缺失数据处理方法:与面板数据的比较分析
JMIR Med Inform. 2020 Dec 17;8(12):e20597. doi: 10.2196/20597.