• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

观察性健康数据的缺失情况分析

Analysis of Missingness Scenarios for Observational Health Data.

作者信息

Zamanian Alireza, von Kleist Henrik, Ciora Octavia-Andreea, Piperno Marta, Lancho Gino, Ahmidi Narges

机构信息

Department of Computer Science, TUM School of Computation, Information and Technology, Technical University of Munich, 85748 Munich, Germany.

Fraunhofer Institute for Cognitive Systems IKS, 80686 Munich, Germany.

出版信息

J Pers Med. 2024 May 11;14(5):514. doi: 10.3390/jpm14050514.

DOI:10.3390/jpm14050514
PMID:38793096
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11122060/
Abstract

Despite the extensive literature on missing data theory and cautionary articles emphasizing the importance of realistic analysis for healthcare data, a critical gap persists in incorporating domain knowledge into the missing data methods. In this paper, we argue that the remedy is to identify the key scenarios that lead to data missingness and investigate their theoretical implications. Based on this proposal, we first introduce an analysis framework where we investigate how different observation agents, such as physicians, influence the data availability and then scrutinize each scenario with respect to the steps in the missing data analysis. We apply this framework to the case study of observational data in healthcare facilities. We identify ten fundamental missingness scenarios and show how they influence the identification step for missing data graphical models, inverse probability weighting estimation, and exponential tilting sensitivity analysis. To emphasize how domain-informed analysis can improve method reliability, we conduct simulation studies under the influence of various missingness scenarios. We compare the results of three common methods in medical data analysis: complete-case analysis, Missforest imputation, and inverse probability weighting estimation. The experiments are conducted for two objectives: variable mean estimation and classification accuracy. We advocate for our analysis approach as a reference for the observational health data analysis. Beyond that, we also posit that the proposed analysis framework is applicable to other medical domains.

摘要

尽管关于缺失数据理论的文献丰富,且有警示性文章强调对医疗数据进行现实分析的重要性,但在将领域知识纳入缺失数据方法方面仍存在关键差距。在本文中,我们认为补救办法是识别导致数据缺失的关键情形,并研究其理论影响。基于此提议,我们首先引入一个分析框架,在该框架中我们研究不同的观测主体(如医生)如何影响数据可用性,然后针对缺失数据分析的各个步骤仔细审查每种情形。我们将此框架应用于医疗机构观测数据的案例研究。我们识别出十种基本的缺失情形,并展示它们如何影响缺失数据图形模型的识别步骤、逆概率加权估计以及指数倾斜敏感性分析。为强调领域知情分析如何能提高方法的可靠性,我们在各种缺失情形的影响下进行模拟研究。我们比较医学数据分析中三种常用方法的结果:完整病例分析、Missforest插补法和逆概率加权估计。实验针对两个目标进行:变量均值估计和分类准确性。我们倡导将我们的分析方法作为观测健康数据分析的参考。除此之外,我们还认为所提出的分析框架适用于其他医学领域。

相似文献

1
Analysis of Missingness Scenarios for Observational Health Data.观察性健康数据的缺失情况分析
J Pers Med. 2024 May 11;14(5):514. doi: 10.3390/jpm14050514.
2
Comparison between inverse-probability weighting and multiple imputation in Cox model with missing failure subtype.缺失失效亚组的 Cox 模型中逆概率加权与多重插补的比较
Stat Methods Med Res. 2024 Feb;33(2):344-356. doi: 10.1177/09622802231226328. Epub 2024 Jan 23.
3
Robust imputation method with context-aware voting ensemble model for management of water-quality data.具有上下文感知投票集成模型的稳健插补方法用于水质数据管理。
Water Res. 2023 Sep 1;243:120369. doi: 10.1016/j.watres.2023.120369. Epub 2023 Jul 16.
4
Outcome-sensitive multiple imputation: a simulation study.结果敏感多重填补:一项模拟研究。
BMC Med Res Methodol. 2017 Jan 9;17(1):2. doi: 10.1186/s12874-016-0281-5.
5
Propensity score analysis with partially observed covariates: How should multiple imputation be used?倾向评分分析与部分观测协变量:应如何使用多重插补?
Stat Methods Med Res. 2019 Jan;28(1):3-19. doi: 10.1177/0962280217713032. Epub 2017 Jun 2.
6
Multiple imputation with missing indicators as proxies for unmeasured variables: simulation study.缺失指标的多重插补作为未测量变量的代理:模拟研究。
BMC Med Res Methodol. 2020 Jul 8;20(1):185. doi: 10.1186/s12874-020-01068-x.
7
Missing confounding data in marginal structural models: a comparison of inverse probability weighting and multiple imputation.边际结构模型中缺失的混杂数据:逆概率加权法与多重填补法的比较
Int J Biostat. 2008;4(1):Article 13. doi: 10.2202/1557-4679.1106.
8
Common Methods for Handling Missing Data in Marginal Structural Models: What Works and Why.边缘结构模型中缺失数据处理的常用方法:什么方法有效,为什么有效。
Am J Epidemiol. 2021 Apr 6;190(4):663-672. doi: 10.1093/aje/kwaa225.
9
Dealing with missing delirium assessments in prospective clinical studies of the critically ill: a simulation study and reanalysis of two delirium studies.处理危重症患者前瞻性临床研究中缺失的谵妄评估:一项模拟研究和两项谵妄研究的重新分析。
BMC Med Res Methodol. 2021 May 6;21(1):97. doi: 10.1186/s12874-021-01274-1.
10
Handling of missing data with multiple imputation in observational studies that address causal questions: protocol for a scoping review.针对因果问题的观察性研究中缺失数据的多重插补处理:范围综述的方案。
BMJ Open. 2023 Feb 1;13(2):e065576. doi: 10.1136/bmjopen-2022-065576.

引用本文的文献

1
Cafe: Improved Federated Data Imputation by Leveraging Missing Data Heterogeneity.Cafe:利用缺失数据异质性改进联邦数据插补
IEEE Trans Knowl Data Eng. 2025 May;37(5):2266-2281. doi: 10.1109/TKDE.2025.3537403. Epub 2025 Jan 30.

本文引用的文献

1
Assessable and interpretable sensitivity analysis in the pattern graph framework for nonignorable missingness mechanisms.针对不可忽略的缺失机制,在模式图框架中进行可评估和可解释的敏感性分析。
Stat Med. 2023 Dec 20;42(29):5419-5450. doi: 10.1002/sim.9920. Epub 2023 Sep 27.
2
A self-censoring model for multivariate nonignorable nonmonotone missing data.一种用于多元非可忽略非单调缺失数据的自审查模型。
Biometrics. 2023 Dec;79(4):3203-3214. doi: 10.1111/biom.13916. Epub 2023 Jul 24.
3
Assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the MCAR/MAR/MNAR classification.
多变量缺失数据研究中的假设和分析计划:超越 MCAR/MAR/MNAR 分类。
Int J Epidemiol. 2023 Aug 2;52(4):1268-1275. doi: 10.1093/ije/dyad008.
4
MIMIC-IV, a freely accessible electronic health record dataset.MIMIC-IV,一个可自由访问的电子健康记录数据集。
Sci Data. 2023 Jan 3;10(1):1. doi: 10.1038/s41597-022-01899-x.
5
Exploring the impact of selection bias in observational studies of COVID-19: a simulation study.探讨 COVID-19 观察性研究中选择偏倚的影响:一项模拟研究。
Int J Epidemiol. 2023 Feb 8;52(1):44-57. doi: 10.1093/ije/dyac221.
6
Semiparametric Inference for Nonmonotone Missing-Not-at-Random Data: The No Self-Censoring Model.非单调缺失非随机数据的半参数推断:无自删失模型
J Am Stat Assoc. 2022;117(539):1415-1423. doi: 10.1080/01621459.2020.1862669. Epub 2021 Feb 3.
7
Critical care hepatology: definitions, incidence, prognosis and role of liver failure in critically ill patients.危重病肝科学:定义、发生率、预后以及肝衰竭在危重病患者中的作用。
Crit Care. 2022 Sep 26;26(1):289. doi: 10.1186/s13054-022-04163-1.
8
Contribution of obesity and cardiometabolic risk factors in developing cardiovascular disease: a population-based cohort study.肥胖和心血管代谢危险因素对心血管疾病发展的影响:基于人群的队列研究。
Sci Rep. 2022 Jan 28;12(1):1544. doi: 10.1038/s41598-022-05536-w.
9
Delayed discharge: how are services and patients being affected?延迟出院:服务和患者受到了怎样的影响?
BMJ. 2022 Jan 17;376:o118. doi: 10.1136/bmj.o118.
10
Discrete Choice Models for Nonmonotone Nonignorable Missing Data: Identification and Inference.非单调不可忽略缺失数据的离散选择模型:识别与推断
Stat Sin. 2018 Oct;28(4):2069-2088. doi: 10.5705/ss.202016.0325.