• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
A method for comparing multiple imputation techniques: A case study on the U.S. national COVID cohort collaborative.一种比较多重插补技术的方法:以美国国家 COVID 队列协作研究为例。
J Biomed Inform. 2023 Mar;139:104295. doi: 10.1016/j.jbi.2023.104295. Epub 2023 Jan 27.
2
Evaluation of multiple imputation approaches for handling missing covariate information in a case-cohort study with a binary outcome.评价在二分类结局病例-对照研究中采用多种插补方法处理协变量缺失信息的效果。
BMC Med Res Methodol. 2022 Apr 3;22(1):87. doi: 10.1186/s12874-021-01495-4.
3
Mechanism-aware imputation: a two-step approach in handling missing values in metabolomics.基于机制的插补:代谢组学中处理缺失值的两步法。
BMC Bioinformatics. 2022 May 16;23(1):179. doi: 10.1186/s12859-022-04659-1.
4
Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study.考虑由于非随机缺失结局数据导致的偏倚:两种概率性偏倚分析方法的比较和说明:一项模拟研究。
BMC Med Res Methodol. 2024 Nov 13;24(1):278. doi: 10.1186/s12874-024-02382-4.
5
Extremely missing numerical data in Electronic Health Records for machine learning can be managed through simple imputation methods considering informative missingness: A comparative of solutions in a COVID-19 mortality case study.在电子健康记录中,针对机器学习的极度缺失数值数据可以通过考虑信息性缺失的简单插补方法来处理:一项关于COVID-19死亡率案例研究中各种解决方案的比较
Comput Methods Programs Biomed. 2023 Dec;242:107803. doi: 10.1016/j.cmpb.2023.107803. Epub 2023 Sep 7.
6
Multiple imputation with sequential penalized regression.多重插补与序贯惩罚回归。
Stat Methods Med Res. 2019 May;28(5):1311-1327. doi: 10.1177/0962280218755574. Epub 2018 Feb 16.
7
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
8
Addressing missing values in routine health information system data: an evaluation of imputation methods using data from the Democratic Republic of the Congo during the COVID-19 pandemic.解决常规卫生信息系统数据中的缺失值问题:使用刚果民主共和国在 COVID-19 大流行期间的数据评估插补方法。
Popul Health Metr. 2021 Nov 4;19(1):44. doi: 10.1186/s12963-021-00274-z.
9
Imputation and Missing Indicators for Handling Missing Longitudinal Data: Data Simulation Analysis Based on Electronic Health Record Data.处理纵向缺失数据的插补与缺失指示符:基于电子健康记录数据的模拟分析
JMIR Med Inform. 2025 Mar 13;13:e64354. doi: 10.2196/64354.
10
Imputation and missing indicators for handling missing data in the development and deployment of clinical prediction models: A simulation study.处理临床预测模型开发和部署中缺失数据的插补和缺失指标:一项模拟研究。
Stat Methods Med Res. 2023 Aug;32(8):1461-1477. doi: 10.1177/09622802231165001. Epub 2023 Apr 27.

引用本文的文献

1
Increasing the Utility of Real-World Data to Inform Public Health Decision Making Through a US-based Private-Public Partnership: 10 Lessons Learned from a Principled Approach to Rapid Pandemic RWE Generation.通过美国的公私伙伴关系提高真实世界数据在为公共卫生决策提供信息方面的效用:从快速大流行真实世界证据生成的原则性方法中学到的十条经验教训。
Ther Innov Regul Sci. 2025 May;59(3):629-641. doi: 10.1007/s43441-025-00748-4. Epub 2025 Mar 18.
2
Conceptual framework as a guide to choose appropriate imputation method for missing values in a clinical structured dataset.概念框架作为选择临床结构化数据集中缺失值的适当插补方法的指南。
BMC Med Res Methodol. 2025 Feb 20;25(1):43. doi: 10.1186/s12874-025-02496-3.
3
Prospective Associations of Physical Activity and Sedentary Time in Adolescence with Cardiometabolic Risk in Young Adulthood.青少年时期的身体活动和久坐时间与青年期心脏代谢风险的前瞻性关联。
Med Sci Sports Exerc. 2025 Mar 1;57(3):535-543. doi: 10.1249/MSS.0000000000003595. Epub 2024 Nov 6.
4
Identify the most appropriate imputation method for handling missing values in clinical structured datasets: a systematic review.识别处理临床结构化数据集缺失值的最合适插补方法:系统评价。
BMC Med Res Methodol. 2024 Aug 28;24(1):188. doi: 10.1186/s12874-024-02310-6.
5
Uncovering COVID-19 transmission tree: identifying traced and untraced infections in an infection network.揭示 COVID-19 传播树:在感染网络中识别已追踪和未追踪的感染。
Front Public Health. 2024 Jun 3;12:1362823. doi: 10.3389/fpubh.2024.1362823. eCollection 2024.
6
Association of post-COVID phenotypic manifestations with new-onset psychiatric disease.新冠感染后表型与新发精神疾病的关联。
Transl Psychiatry. 2024 Jun 8;14(1):246. doi: 10.1038/s41398-024-02967-z.
7
Thrombosis risk prediction in lymphoma patients: A multi-institutional, retrospective model development and validation study.淋巴瘤患者的血栓形成风险预测:一项多机构回顾性模型开发与验证研究。
Am J Hematol. 2024 Jul;99(7):1230-1239. doi: 10.1002/ajh.27335. Epub 2024 Apr 23.
8
Predicting nutrition and environmental factors associated with female reproductive disorders using a knowledge graph and random forests.利用知识图谱和随机森林预测与女性生殖障碍相关的营养和环境因素。
Int J Med Inform. 2024 Jul;187:105461. doi: 10.1016/j.ijmedinf.2024.105461. Epub 2024 Apr 17.
9
Predicting nutrition and environmental factors associated with female reproductive disorders using a knowledge graph and random forests.利用知识图谱和随机森林预测与女性生殖系统疾病相关的营养和环境因素。
medRxiv. 2023 Jul 16:2023.07.14.23292679. doi: 10.1101/2023.07.14.23292679.

本文引用的文献

1
Metformin is associated with reduced COVID-19 severity in patients with prediabetes.二甲双胍可降低糖尿病前期患者 COVID-19 严重程度。
Diabetes Res Clin Pract. 2022 Dec;194:110157. doi: 10.1016/j.diabres.2022.110157. Epub 2022 Nov 15.
2
Issues With Variability in Electronic Health Record Data About Race and Ethnicity: Descriptive Analysis of the National COVID Cohort Collaborative Data Enclave.电子健康记录中种族和民族数据的变异性问题:国家新冠队列协作数据中心的描述性分析
JMIR Med Inform. 2022 Sep 6;10(9):e39235. doi: 10.2196/39235.
3
Risk of new-onset psychiatric sequelae of COVID-19 in the early and late post-acute phase.新冠病毒感染急性后期早期和晚期出现新发精神后遗症的风险。
World Psychiatry. 2022 Jun;21(2):319-320. doi: 10.1002/wps.20992.
4
Partial Multiple Imputation With Variational Autoencoders: Tackling Not at Randomness in Healthcare Data.使用变分自编码器的部分多重插补:应对医疗保健数据中的非随机缺失
IEEE J Biomed Health Inform. 2022 Aug;26(8):4218-4227. doi: 10.1109/JBHI.2022.3172656. Epub 2022 Aug 11.
5
Glycemic Control and Clinical Outcomes in U.S. Patients With COVID-19: Data From the National COVID Cohort Collaborative (N3C) Database.美国新冠肺炎患者的血糖控制与临床结局:来自国家新冠肺炎队列协作组(N3C)数据库的数据
Diabetes Care. 2022 Feb 24;45(5):1099-106. doi: 10.2337/dc21-2186.
6
Association Between COVID-19 and Mortality in Hip Fracture Surgery in the National COVID Cohort Collaborative (N3C): A Retrospective Cohort Study.COVID-19 与全国 COVID 队列协作研究(N3C)中髋部骨折手术死亡率的相关性:一项回顾性队列研究。
J Am Acad Orthop Surg Glob Res Rev. 2022 Jan 4;6(1):e21.00282. doi: 10.5435/JAAOSGlobal-D-21-00282.
7
Characterizing Long COVID: Deep Phenotype of a Complex Condition.长新冠的特征:复杂病症的深度表型。
EBioMedicine. 2021 Dec;74:103722. doi: 10.1016/j.ebiom.2021.103722. Epub 2021 Nov 25.
8
Explainable Machine Learning for Early Assessment of COVID-19 Risk Prediction in Emergency Departments.用于急诊科COVID-19风险预测早期评估的可解释机器学习
IEEE Access. 2020 Oct 26;8:196299-196325. doi: 10.1109/ACCESS.2020.3034032. eCollection 2020.
9
Disparities in COVID-19 Outcomes by Race, Ethnicity, and Socioeconomic Status: A Systematic-Review and Meta-analysis.COVID-19结局在种族、族裔和社会经济地位方面的差异:一项系统评价和荟萃分析。
JAMA Netw Open. 2021 Nov 1;4(11):e2134147. doi: 10.1001/jamanetworkopen.2021.34147.
10
Associations between HIV infection and clinical spectrum of COVID-19: a population level analysis based on US National COVID Cohort Collaborative (N3C) data.HIV 感染与 COVID-19 临床谱的相关性:基于美国国家 COVID 队列协作(N3C)数据的人群水平分析。
Lancet HIV. 2021 Nov;8(11):e690-e700. doi: 10.1016/S2352-3018(21)00239-3. Epub 2021 Oct 13.

一种比较多重插补技术的方法:以美国国家 COVID 队列协作研究为例。

A method for comparing multiple imputation techniques: A case study on the U.S. national COVID cohort collaborative.

机构信息

AnacletoLab, Department of Computer Science "Giovanni degli Antoni", Università degli Studi di Milano, Milan, Italy; CINI, Infolife National Laboratory, Roma, Italy; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.

Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA.

出版信息

J Biomed Inform. 2023 Mar;139:104295. doi: 10.1016/j.jbi.2023.104295. Epub 2023 Jan 27.

DOI:10.1016/j.jbi.2023.104295
PMID:36716983
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10683778/
Abstract

Healthcare datasets obtained from Electronic Health Records have proven to be extremely useful for assessing associations between patients' predictors and outcomes of interest. However, these datasets often suffer from missing values in a high proportion of cases, whose removal may introduce severe bias. Several multiple imputation algorithms have been proposed to attempt to recover the missing information under an assumed missingness mechanism. Each algorithm presents strengths and weaknesses, and there is currently no consensus on which multiple imputation algorithm works best in a given scenario. Furthermore, the selection of each algorithm's parameters and data-related modeling choices are also both crucial and challenging. In this paper we propose a novel framework to numerically evaluate strategies for handling missing data in the context of statistical analysis, with a particular focus on multiple imputation techniques. We demonstrate the feasibility of our approach on a large cohort of type-2 diabetes patients provided by the National COVID Cohort Collaborative (N3C) Enclave, where we explored the influence of various patient characteristics on outcomes related to COVID-19. Our analysis included classic multiple imputation techniques as well as simple complete-case Inverse Probability Weighted models. Extensive experiments show that our approach can effectively highlight the most promising and performant missing-data handling strategy for our case study. Moreover, our methodology allowed a better understanding of the behavior of the different models and of how it changed as we modified their parameters. Our method is general and can be applied to different research fields and on datasets containing heterogeneous types.

摘要

从电子健康记录中获取的医疗保健数据集已被证明对于评估患者预测因素与感兴趣的结果之间的关联非常有用。然而,这些数据集通常在很大比例的情况下存在缺失值,删除这些缺失值可能会引入严重的偏差。已经提出了几种多重插补算法来尝试根据假定的缺失机制恢复缺失信息。每种算法都有其优点和缺点,目前对于哪种多重插补算法在给定场景下效果最好还没有共识。此外,每个算法的参数选择和与数据相关的建模选择也同样至关重要和具有挑战性。在本文中,我们提出了一种新的框架,用于数值评估在统计分析背景下处理缺失数据的策略,特别是多重插补技术。我们在由国家 COVID 队列协作(N3C)飞地提供的大型 2 型糖尿病患者队列上展示了我们方法的可行性,我们在其中探索了各种患者特征对与 COVID-19 相关的结果的影响。我们的分析包括经典的多重插补技术以及简单的完全病例逆概率加权模型。广泛的实验表明,我们的方法可以有效地突出最有前途和表现最好的缺失数据处理策略。此外,我们的方法允许更好地理解不同模型的行为以及随着参数的修改而如何改变。我们的方法是通用的,可以应用于不同的研究领域和包含异构类型的数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a275/10683778/031ed114a224/nihms-1932184-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a275/10683778/c0ad05ee427c/nihms-1932184-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a275/10683778/242f7c7673af/nihms-1932184-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a275/10683778/70f8bfc4db7c/nihms-1932184-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a275/10683778/00641de0c7d8/nihms-1932184-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a275/10683778/267e27f9113d/nihms-1932184-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a275/10683778/fb1835caa397/nihms-1932184-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a275/10683778/031ed114a224/nihms-1932184-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a275/10683778/c0ad05ee427c/nihms-1932184-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a275/10683778/242f7c7673af/nihms-1932184-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a275/10683778/70f8bfc4db7c/nihms-1932184-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a275/10683778/00641de0c7d8/nihms-1932184-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a275/10683778/267e27f9113d/nihms-1932184-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a275/10683778/fb1835caa397/nihms-1932184-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a275/10683778/031ed114a224/nihms-1932184-f0007.jpg