• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

比较多种多重填补方法以解决免疫接种信息系统中患者人口统计学数据缺失问题:回顾性队列研究。

Comparing Multiple Imputation Methods to Address Missing Patient Demographics in Immunization Information Systems: Retrospective Cohort Study.

作者信息

Brown Sara, Kudia Ousswa, Kleine Kaye, Kidd Bryndan, Wines Robert, Meckes Nathanael

机构信息

Scientific Services - Analytics, Scientific Technologies Corporation (United States), 411 S 1st St, Phoenix, AZ, 85004, United States, 1 480-745-8500.

Immunization Services, West Virginia Department of Health and Human Services, Charleston, WV, United States.

出版信息

JMIR Public Health Surveill. 2025 Aug 26;11:e73916. doi: 10.2196/73916.

DOI:10.2196/73916
PMID:40857554
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12380239/
Abstract

BACKGROUND

Immunization Information Systems (IIS) and surveillance data are essential for public health interventions and programming; however, missing data are often a challenge, potentially introducing bias and impacting the accuracy of vaccine coverage assessments, particularly in addressing disparities.

OBJECTIVE

This study aimed to evaluate the performance of 3 multiple imputation methods, Stata's (StataCorp LLC) multiple imputation using chained equations (MICE), scikit-learn's Iterative-Imputer, and Python's miceforest package, in managing missing race and ethnicity data in large-scale surveillance datasets. We compared these methodologies in their ability to preserve demographic distribution, computational efficiency, and performed G-tests on contingency tables to obtain likelihood ratio statistics to assess the association between race and ethnicity and flu vaccination status.

METHODS

In this retrospective cohort study, we analyzed 2021-2022 flu vaccination and demographic data from the West Virginia Immunization Information System (N=2,302,036), where race (15%) and ethnicity (34%) were missing. MICE, Iterative Imputer, and miceforest were used to impute missing variables, generating 15 datasets each. Computational efficiency, demographic distribution preservation, and spatial clustering patterns were assessed using G-statistics.

RESULTS

After imputation, an additional 780,339 observations were obtained compared with complete case analysis. All imputation methods exhibited significant spatial clustering for race imputation (G-statistics: MICE=26,452.7, Iterative-Imputer=128,280.3, Miceforest=26,891.5; P<.001), while ethnicity imputation showed variable clustering patterns (G-statistics: MICE=1142.2, Iterative-Imputer=1.7, Miceforest=2185.0; P: MICE<.001, Iterative-Imputer=1.7, Miceforest<.001). MICE and miceforest best preserved the proportional distribution of demographics. Computational efficiency varied, with MICE requiring 14 hours, Iterative Imputer 2 minutes, and miceforest 10 minutes for 15 imputations. Postimputation estimates indicated a 0.87%-18% reduction in stratified flu vaccination coverage rates. Overall estimated flu vaccination rates decreased from 26% to 19% after imputations.

CONCLUSIONS

Both MICE and Miceforest offer flexible and reliable approaches for imputing missing demographic data while mitigating bias compared with Iterative-Imputer. Our results also highlight that the imputation method can profoundly affect research findings. Though MICE and Miceforest had better effect sizes and reliability, MICE was much more computationally and time-expensive, limiting its use in large, surveillance datasets. Miceforest can use cloud-based computing, which further enhances efficiency by offloading resource-intensive tasks, enabling parallel execution, and minimizing processing delays. The significant decrease in vaccination coverage estimates validates how incomplete or missing data can eclipse real disparities. Our findings support regular application of imputation methods in immunization surveillance to improve health equity evaluations and shape targeted public health interventions and programming.

摘要

背景

免疫信息系统(IIS)和监测数据对于公共卫生干预措施和规划至关重要;然而,缺失数据往往是一个挑战,可能会引入偏差并影响疫苗接种覆盖率评估的准确性,尤其是在解决差异方面。

目的

本研究旨在评估三种多重插补方法,即Stata公司(StataCorp LLC)使用链式方程的多重插补(MICE)、scikit-learn的迭代插补器(Iterative-Imputer)以及Python的miceforest包,在处理大规模监测数据集中缺失的种族和族裔数据方面的性能。我们比较了这些方法在保持人口分布、计算效率方面的能力,并对列联表进行了G检验以获得似然比统计量,以评估种族和族裔与流感疫苗接种状况之间的关联。

方法

在这项回顾性队列研究中,我们分析了西弗吉尼亚免疫信息系统2021 - 2022年的流感疫苗接种和人口数据(N = 2,302,036),其中种族(15%)和族裔(34%)数据缺失。使用MICE、迭代插补器和miceforest对缺失变量进行插补,每种方法生成15个数据集。使用G统计量评估计算效率、人口分布保持情况和空间聚类模式。

结果

插补后,与完整病例分析相比,额外获得了780,339个观测值。所有插补方法在种族插补方面均表现出显著的空间聚类(G统计量:MICE = 26,452.7,迭代插补器 = 128,280.3,miceforest = 26,891.5;P <.001),而族裔插补则呈现出不同的聚类模式(G统计量:MICE = 1142.2,迭代插补器 = 1.7,miceforest = 2185.0;P值:MICE <.001,迭代插补器 = 1.7,miceforest <.001)。MICE和miceforest在保持人口统计学比例分布方面表现最佳。计算效率各不相同,对于15次插补,MICE需要14小时,迭代插补器需要2分钟,miceforest需要10分钟。插补后的估计表明分层流感疫苗接种覆盖率降低了0.87% - 18%。总体估计的流感疫苗接种率在插补后从26%降至19%。

结论

与迭代插补器相比,MICE和miceforest在插补缺失的人口数据时提供了灵活且可靠的方法,同时减轻了偏差。我们的结果还强调,插补方法会对研究结果产生深远影响。尽管MICE和miceforest具有更好的效应大小和可靠性,但MICE在计算和时间成本上要高得多,限制了其在大型监测数据集中的应用。miceforest可以使用基于云的计算,通过卸载资源密集型任务、实现并行执行并最小化处理延迟,进一步提高效率。疫苗接种覆盖率估计值的显著下降证实了不完整或缺失数据如何掩盖实际差异。我们的研究结果支持在免疫监测中定期应用插补方法,以改善健康公平性评估,并制定有针对性的公共卫生干预措施和规划。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0103/12380239/9a3330ef2cf3/publichealth-v11-e73916-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0103/12380239/9dc06e685648/publichealth-v11-e73916-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0103/12380239/bfc453ed65b6/publichealth-v11-e73916-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0103/12380239/9a3330ef2cf3/publichealth-v11-e73916-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0103/12380239/9dc06e685648/publichealth-v11-e73916-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0103/12380239/bfc453ed65b6/publichealth-v11-e73916-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0103/12380239/9a3330ef2cf3/publichealth-v11-e73916-g003.jpg

相似文献

1
Comparing Multiple Imputation Methods to Address Missing Patient Demographics in Immunization Information Systems: Retrospective Cohort Study.比较多种多重填补方法以解决免疫接种信息系统中患者人口统计学数据缺失问题:回顾性队列研究。
JMIR Public Health Surveill. 2025 Aug 26;11:e73916. doi: 10.2196/73916.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能?开发一种互联网应用算法。
Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.
4
Surveillance for Violent Deaths - National Violent Death Reporting System, 50 States, the District of Columbia, and Puerto Rico, 2022.暴力死亡监测——2022年全国暴力死亡报告系统,50个州、哥伦比亚特区和波多黎各
MMWR Surveill Summ. 2025 Jun 12;74(5):1-42. doi: 10.15585/mmwr.ss7405a1.
5
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗?
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.
6
Immunogenicity and seroefficacy of pneumococcal conjugate vaccines: a systematic review and network meta-analysis.肺炎球菌结合疫苗的免疫原性和血清效力:系统评价和网络荟萃分析。
Health Technol Assess. 2024 Jul;28(34):1-109. doi: 10.3310/YWHA3079.
7
Comparison of self-administered survey questionnaire responses collected using mobile apps versus other methods.使用移动应用程序与其他方法收集的自我管理调查问卷回复的比较。
Cochrane Database Syst Rev. 2015 Jul 27;2015(7):MR000042. doi: 10.1002/14651858.MR000042.pub2.
8
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施:系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。
Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.
9
Surveillance for Violent Deaths - National Violent Death Reporting System, 48 States, the District of Columbia, and Puerto Rico, 2020.暴力死亡监测 - 全国暴力死亡报告系统,2020 年,48 个州、哥伦比亚特区和波多黎各。
MMWR Surveill Summ. 2023 May 26;72(5):1-38. doi: 10.15585/mmwr.ss7205a1.
10
Regional cerebral blood flow single photon emission computed tomography for detection of Frontotemporal dementia in people with suspected dementia.用于检测疑似痴呆患者额颞叶痴呆的局部脑血流单光子发射计算机断层扫描
Cochrane Database Syst Rev. 2015 Jun 23;2015(6):CD010896. doi: 10.1002/14651858.CD010896.pub2.

本文引用的文献

1
Leveraging Multiple Administrative Data Sources to Reduce Missing Race and Ethnicity Data: A Descriptive Epidemiology Cross-Sectional Study of COVID-19 Case Relative Rates.利用多个行政数据源减少种族和族裔数据缺失:一项关于COVID-19病例相对率的描述性流行病学横断面研究。
J Racial Ethn Health Disparities. 2024 Oct 22. doi: 10.1007/s40615-024-02211-w.
2
The "Other" race category on birth certificates and its impact on analyses of preterm birth inequity.出生证明上的“其他”种族类别及其对早产不平等分析的影响。
J Perinatol. 2025 Mar;45(3):372-377. doi: 10.1038/s41372-024-02123-x. Epub 2024 Sep 20.
3
Handling Missing Data in COVID-19 Incidence Estimation: Secondary Data Analysis.
处理 COVID-19 发病率估计中的缺失数据:二次数据分析。
JMIR Public Health Surveill. 2024 Aug 20;10:e53719. doi: 10.2196/53719.
4
Improvements in data completeness in health information systems reveal racial inequalities: longitudinal national data from hospital admissions in Brazil 2010-2022.卫生信息系统中数据完整性的提高揭示了种族不平等现象:2010-2022 年巴西住院数据的纵向全国数据。
Int J Equity Health. 2024 Jul 18;23(1):143. doi: 10.1186/s12939-024-02214-3.
5
Addressing bias in preterm birth research: The role of advanced imputation techniques for missing race and ethnicity in perinatal health data.解决早产研究中的偏倚问题:在围产健康数据中缺失种族和民族信息的高级插补技术的作用。
Ann Epidemiol. 2024 Jun;94:120-126. doi: 10.1016/j.annepidem.2024.05.003. Epub 2024 May 10.
6
Multiple Imputation of Missing Race/Ethnicity Information in the National Assisted Reproductive Technology Surveillance System.国家辅助生殖技术监测系统中种族/族裔信息缺失的多重填补
J Womens Health (Larchmt). 2024 Mar;33(3):328-338. doi: 10.1089/jwh.2023.0267. Epub 2023 Dec 19.
7
Bolstering the Measurement of Racial Inequity of COVID-19 Vaccine Uptake.加强对新冠疫苗接种中种族不平等现象的衡量。
Vaccines (Basel). 2023 Apr 21;11(4):876. doi: 10.3390/vaccines11040876.
8
Big Data Analytics Using Cloud Computing Based Frameworks for Power Management Systems: Status, Constraints, and Future Recommendations.基于云计算框架的大数据分析在电力管理系统中的应用:现状、约束和未来建议。
Sensors (Basel). 2023 Mar 8;23(6):2952. doi: 10.3390/s23062952.
9
Blind Spots: Biases in Prehospital Race and Ethnicity Recording.盲点:院前种族和民族记录中的偏差
Prehosp Emerg Care. 2023;27(8):1072-1075. doi: 10.1080/10903127.2023.2175089. Epub 2023 Feb 9.
10
Performance of Multiple Imputation Using Modern Machine Learning Methods in Electronic Health Records Data.基于现代机器学习方法在电子健康记录数据中的应用表现。
Epidemiology. 2023 Mar 1;34(2):206-215. doi: 10.1097/EDE.0000000000001578. Epub 2022 Dec 9.