• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多变量缺失数据研究中的假设和分析计划:超越 MCAR/MAR/MNAR 分类。

Assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the MCAR/MAR/MNAR classification.

机构信息

Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Melbourne, Australia.

Department of Paediatrics, University of Melbourne, Australia.

出版信息

Int J Epidemiol. 2023 Aug 2;52(4):1268-1275. doi: 10.1093/ije/dyad008.

DOI:10.1093/ije/dyad008
PMID:36779333
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10396404/
Abstract

Researchers faced with incomplete data are encouraged to consider whether their data are 'missing completely at random' (MCAR), 'missing at random' (MAR) or 'missing not at random' (MNAR) when planning their analysis. However, there are two major problems with this classification as originally defined by Rubin in the 1970s. First, when there are missing data in multiple variables, the plausibility of the MAR assumption is difficult to assess using substantive knowledge and is more stringent than is generally appreciated. Second, although MCAR and MAR are sufficient conditions for consistent estimation with specific methods, they are not necessary conditions and therefore this categorization does not directly determine the best approach for handling the missing data in an analysis. How best to handle missing data depends on the assumed causal relationships between variables and their missingness, and what these relationships imply in terms of the 'recoverability' of the target estimand (the population parameter that encodes the answer to the underlying research question). Recoverability is defined as whether the estimand can be consistently estimated from the patterns and associations in the observed data without needing to invoke external information on the extent to which the distribution of missing values might differ from that of observed values. In this manuscript we outline an approach for deciding which method to use to handle multivariable missing data in an analysis, using directed acyclic graphs to depict missingness assumptions and determining the implications in terms of recoverability of the target estimand.

摘要

研究人员在进行数据分析前,应考虑其数据属于完全随机缺失(MCAR)、随机缺失(MAR)还是非随机缺失(MNAR)。然而,Rubin 在 20 世纪 70 年代最初定义的这种分类存在两个主要问题。首先,当多个变量存在缺失数据时,基于实质性知识评估 MAR 假设的合理性较为困难,且比普遍认为的更为严格。其次,虽然 MCAR 和 MAR 是使用特定方法进行一致估计的充分条件,但并非必要条件,因此这种分类并不能直接确定分析中处理缺失数据的最佳方法。如何最好地处理缺失数据取决于变量及其缺失之间的假设因果关系,以及这些关系在目标估计量(表示回答基础研究问题的总体参数)的“可恢复性”方面意味着什么。可恢复性是指是否可以根据观察数据中的模式和关联一致地估计估计量,而无需援引关于缺失值分布与观察值分布差异程度的外部信息。在本文中,我们概述了一种使用有向无环图(DAG)来描述缺失假设并根据目标估计量的可恢复性来确定其含义的方法,用于决定在分析中使用哪种方法来处理多变量缺失数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d088/10396404/59cbe521ac0e/dyad008f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d088/10396404/a5d208dd847c/dyad008f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d088/10396404/59cbe521ac0e/dyad008f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d088/10396404/a5d208dd847c/dyad008f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d088/10396404/59cbe521ac0e/dyad008f2.jpg

相似文献

1
Assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the MCAR/MAR/MNAR classification.多变量缺失数据研究中的假设和分析计划:超越 MCAR/MAR/MNAR 分类。
Int J Epidemiol. 2023 Aug 2;52(4):1268-1275. doi: 10.1093/ije/dyad008.
2
Canonical Causal Diagrams to Guide the Treatment of Missing Data in Epidemiologic Studies.规范因果图指导流行病学研究中缺失数据的处理。
Am J Epidemiol. 2018 Dec 1;187(12):2705-2715. doi: 10.1093/aje/kwy173.
3
Gaps in the usage and reporting of multiple imputation for incomplete data: findings from a scoping review of observational studies addressing causal questions.缺失数据下多重插补使用和报告的差距:针对因果问题的观察性研究的范围综述结果。
BMC Med Res Methodol. 2024 Sep 4;24(1):193. doi: 10.1186/s12874-024-02302-6.
4
Missing not at random in end of life care studies: multiple imputation and sensitivity analysis on data from the ACTION study.在生命终末期关怀研究中并非随机缺失:来自 ACTION 研究的数据的多重插补和敏感性分析。
BMC Med Res Methodol. 2021 Jan 9;21(1):13. doi: 10.1186/s12874-020-01180-y.
5
Score test for missing at random or not under logistic missingness models.基于逻辑缺失模型的随机缺失或非随机缺失的评分检验。
Biometrics. 2023 Jun;79(2):1268-1279. doi: 10.1111/biom.13666. Epub 2022 Apr 7.
6
Recoverability and estimation of causal effects under typical multivariable missingness mechanisms.典型多变量缺失机制下因果效应的可恢复性和估计。
Biom J. 2024 Apr;66(3):e2200326. doi: 10.1002/bimj.202200326.
7
Heckman imputation models for binary or continuous MNAR outcomes and MAR predictors.Heckman 插补模型用于二分类或连续 MNAR 结局和 MAR 预测因子。
BMC Med Res Methodol. 2018 Aug 31;18(1):90. doi: 10.1186/s12874-018-0547-1.
8
Missing data and multiple imputation in clinical epidemiological research.临床流行病学研究中的缺失数据与多重填补
Clin Epidemiol. 2017 Mar 15;9:157-166. doi: 10.2147/CLEP.S129785. eCollection 2017.
9
Using causal diagrams to guide analysis in missing data problems.使用因果图指导缺失数据问题的分析。
Stat Methods Med Res. 2012 Jun;21(3):243-56. doi: 10.1177/0962280210394469. Epub 2011 Mar 9.
10
A hybrid return to baseline imputation method to incorporate MAR and MNAR dropout missingness.一种混合的回归到基线填补方法,用于纳入 MAR 和 MNAR 缺失。
Contemp Clin Trials. 2022 Sep;120:106859. doi: 10.1016/j.cct.2022.106859. Epub 2022 Jul 21.

引用本文的文献

1
Progress of chronic kidney disease and associated predictors among patients under treatment at Gambi and Felege-Hiwote hospitals.甘比医院和费莱格-希沃特医院接受治疗的患者中慢性肾脏病的进展及相关预测因素
Sci Rep. 2025 Aug 2;15(1):28213. doi: 10.1038/s41598-025-13031-1.
2
Optimising dynamic treatment regimens using sequential multiple assignment randomised trials data with missing data.利用带有缺失数据的序贯多组分配随机试验数据优化动态治疗方案
BMC Med Res Methodol. 2025 Jul 1;25(1):162. doi: 10.1186/s12874-025-02595-1.
3
Occupational Health and Safety Among Brazilian Immigrant Women in the United States: A Cross-Sectional Survey.

本文引用的文献

1
Framework for the treatment and reporting of missing data in observational studies: The Treatment And Reporting of Missing data in Observational Studies framework.观察性研究中缺失数据的处理和报告框架:观察性研究中缺失数据的处理和报告框架。
J Clin Epidemiol. 2021 Jun;134:79-88. doi: 10.1016/j.jclinepi.2021.01.008. Epub 2021 Feb 2.
2
Identification In Missing Data Models Represented By Directed Acyclic Graphs.有向无环图表示的缺失数据模型中的识别
Uncertain Artif Intell. 2019 Jul;2019.
3
Canonical Causal Diagrams to Guide the Treatment of Missing Data in Epidemiologic Studies.
美国巴西移民女性的职业健康与安全:一项横断面调查。
Int J Environ Res Public Health. 2025 Jun 19;22(6):963. doi: 10.3390/ijerph22060963.
4
Impact of Polyhexanide Care Bundle on Surgical Site Infections in Paediatric and Neonatal Cardiac Surgery: A Propensity Score-Matched Retrospective Cohort Study.聚己缩胍护理包对小儿及新生儿心脏手术部位感染的影响:一项倾向评分匹配的回顾性队列研究
Int Wound J. 2025 Jun;22(6):e70710. doi: 10.1111/iwj.70710.
5
Preserving Informative Presence: How Missing Data and Imputation Strategies Affect the Performance of an AI-Based Early Warning Score.保留信息性存在:缺失数据和插补策略如何影响基于人工智能的早期预警评分的性能
J Clin Med. 2025 Mar 24;14(7):2213. doi: 10.3390/jcm14072213.
6
The Completeness of the Operating Room Data.手术室数据的完整性
Methods Inf Med. 2024 Sep;63(3-04):137-144. doi: 10.1055/a-2566-7958. Epub 2025 Mar 26.
7
XeroGraph: enhancing data integrity in the presence of missing values with statistical and predictive analysis.XeroGraph:通过统计和预测分析在存在缺失值的情况下增强数据完整性。
Bioinform Adv. 2025 Feb 21;5(1):vbaf035. doi: 10.1093/bioadv/vbaf035. eCollection 2025.
8
To Impute or Not To Impute in Untargeted Metabolomics─That is the Compositional Question.非靶向代谢组学中是否进行插补——这就是成分问题。
J Am Soc Mass Spectrom. 2025 Apr 2;36(4):742-759. doi: 10.1021/jasms.4c00434. Epub 2025 Feb 25.
9
Comparative study of imputation strategies to improve the sarcopenia prediction task.用于改善肌肉减少症预测任务的插补策略的比较研究。
Digit Health. 2025 Jan 17;11:20552076241301960. doi: 10.1177/20552076241301960. eCollection 2025 Jan-Dec.
10
Association between dietary folate intake and severe headache or migraine in adults: a cross-sectional study of the National Health and Nutrition Examination Survey.成人膳食叶酸摄入量与严重头痛或偏头痛之间的关联:一项基于美国国家健康与营养检查调查的横断面研究
Front Nutr. 2024 Nov 26;11:1456502. doi: 10.3389/fnut.2024.1456502. eCollection 2024.
规范因果图指导流行病学研究中缺失数据的处理。
Am J Epidemiol. 2018 Dec 1;187(12):2705-2715. doi: 10.1093/aje/kwy173.
4
On the use of the not-at-random fully conditional specification (NARFCS) procedure in practice.关于在实践中使用非随机完全条件规范(NARFCS)程序。
Stat Med. 2018 Jul 10;37(15):2338-2353. doi: 10.1002/sim.7643. Epub 2018 Apr 2.
5
Analyses of Sensitivity to the Missing-at-Random Assumption Using Multiple Imputation With Delta Adjustment: Application to a Tuberculosis/HIV Prevalence Survey With Incomplete HIV-Status Data.使用带增量调整的多重填补法对随机缺失假设的敏感性分析:应用于HIV感染状况数据不完整的结核病/HIV患病率调查。
Am J Epidemiol. 2017 Feb 15;185(4):304-315. doi: 10.1093/aje/kww107.
6
The prevention and treatment of missing data in clinical trials.临床试验中缺失数据的预防与处理
N Engl J Med. 2012 Oct 4;367(14):1355-60. doi: 10.1056/NEJMsr1203730.
7
Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values.缺失协变量值的多重插补与完全案例分析相比的偏差和效率。
Stat Med. 2010 Dec 10;29(28):2920-31. doi: 10.1002/sim.3944.
8
Causal diagrams for epidemiologic research.流行病学研究的因果图。
Epidemiology. 1999 Jan;10(1):37-48.