违反分布条件时用于缺失数据的极大似然估计与极大信息估计对比

ML versus MI for Missing Data with Violation of Distribution Conditions.

作者信息

Yuan Ke-Hai, Yang-Wallentin Fan, Bentler Peter M

机构信息

University of Notre Dame.

Uppsala University, Sweden.

出版信息

Sociol Methods Res. 2012 Nov;41(4):598-629. doi: 10.1177/0049124112460373.

DOI:10.1177/0049124112460373

PMID:24764604

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3995817/

Abstract

Normal-distribution-based maximum likelihood (ML) and multiple imputation (MI) are the two major procedures for missing data analysis. This article compares the two procedures with respects to bias and efficiency of parameter estimates. It also compares formula-based standard errors (SEs) for each procedure against the corresponding empirical SEs. The results indicate that parameter estimates by MI tend to be less efficient than those by ML; and the estimates of variance-covariance parameters by MI are also more biased. In particular, when the population for the observed variables possesses heavy tails, estimates of variance-covariance parameters by MI may contain severe bias even at relative large sample sizes. Although performing a lot better, ML parameter estimates may also contain substantial bias at smaller sample sizes. The results also indicate that, when the underlying population is close to normally distributed, SEs based on the sandwich-type covariance matrix and those based on the observed information matrix are very comparable to empirical SEs with either ML or MI. When the underlying distribution has heavier tails, SEs based on the sandwich-type covariance matrix for ML estimates are more reliable than those based on the observed information matrix. Both empirical results and analysis show that neither SEs based on the observed information matrix nor those based on the sandwich-type covariance matrix can provide consistent SEs in MI. Thus, ML is preferable to MI in practice, although parameter estimates by MI might still be consistent.

摘要

基于正态分布的极大似然法（ML）和多重填补法（MI）是缺失数据分析的两种主要方法。本文比较了这两种方法在参数估计偏差和效率方面的差异。同时，还将每种方法基于公式的标准误（SEs）与相应的经验标准误进行了比较。结果表明，MI的参数估计往往不如ML有效；而且MI对方差 - 协方差参数的估计偏差也更大。特别是，当观测变量的总体具有厚尾分布时，即使在相对大的样本量下，MI对方差 - 协方差参数的估计也可能存在严重偏差。虽然ML的参数估计表现要好得多，但在较小样本量时也可能存在较大偏差。结果还表明，当基础总体接近正态分布时，基于三明治型协方差矩阵的标准误和基于观测信息矩阵的标准误与ML或MI的经验标准误非常接近。当基础分布具有更厚的尾部时，基于三明治型协方差矩阵的ML估计标准误比基于观测信息矩阵的标准误更可靠。实证结果和分析均表明，在MI中，基于观测信息矩阵的标准误和基于三明治型协方差矩阵的标准误都不能提供一致的标准误。因此，在实际应用中，ML优于MI，尽管MI的参数估计可能仍然是一致的。

相似文献

ML versus MI for Missing Data with Violation of Distribution Conditions.违反分布条件时用于缺失数据的极大似然估计与极大信息估计对比

Sociol Methods Res. 2012 Nov;41(4):598-629. doi: 10.1177/0049124112460373.

Information matrices and standard errors for MLEs of item parameters in IRT.IRT中项目参数极大似然估计的信息矩阵和标准误差。

Psychometrika. 2014 Apr;79(2):232-54. doi: 10.1007/s11336-013-9334-4. Epub 2013 Mar 27.

Bias and Efficiency in Structural Equation Modeling: Maximum Likelihood Versus Robust Methods.结构方程建模中的偏差和效率：最大似然与稳健方法。

Multivariate Behav Res. 2011 Apr 11;46(2):229-65. doi: 10.1080/00273171.2011.558736.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Standard errors in covariance structure models: asymptotics versus bootstrap.协方差结构模型中的标准误差：渐近法与自助法

Br J Math Stat Psychol. 2006 Nov;59(Pt 2):397-417. doi: 10.1348/000711005X85896.

Information matrix estimation procedures for cognitive diagnostic models.认知诊断模型的信息矩阵估计程序。

Br J Math Stat Psychol. 2019 Feb;72(1):18-37. doi: 10.1111/bmsp.12134. Epub 2018 Mar 6.

Multiple imputation in veterinary epidemiological studies: a case study and simulation.兽医流行病学研究中的多重填补：一个案例研究与模拟

Prev Vet Med. 2016 Jul 1;129:35-47. doi: 10.1016/j.prevetmed.2016.04.003. Epub 2016 May 13.

A comparison of full information maximum likelihood and multiple imputation in structural equation modeling with missing data.缺失数据结构方程建模中完全信息极大似然和多重插补的比较。

Psychol Methods. 2021 Aug;26(4):466-485. doi: 10.1037/met0000381. Epub 2021 Jan 28.

Maximum likelihood versus multiple imputation for missing data in small longitudinal samples with nonnormality.最大似然法与多重插补法在小纵向样本非正态缺失数据中的比较。

Psychol Methods. 2017 Sep;22(3):426-449. doi: 10.1037/met0000094. Epub 2016 Oct 6.

Bias and Precision of the "Multiple Imputation, Then Deletion" Method for Dealing With Missing Outcome Data.处理缺失结局数据的“多次插补，然后删除”方法的偏倚和精密度

Am J Epidemiol. 2015 Sep 15;182(6):528-34. doi: 10.1093/aje/kwv100. Epub 2015 Sep 2.

引用本文的文献

My Turning Point Tells the Story: A Longitudinal Examination of Greater Episodic Detail and Youth Depressive Symptoms.转折点的故事：青少年抑郁症状与情节记忆细节的纵向研究

Res Child Adolesc Psychopathol. 2023 Nov;51(11):1669-1682. doi: 10.1007/s10802-023-01096-3. Epub 2023 Jul 28.

How Relationship Satisfaction and Negative Communication Trajectories Change in Emerging Adults' Dating Relationships: A Group-Based Dual Trajectory Analysis.新兴成年人恋爱关系中关系满意度和负面沟通轨迹如何变化：基于群体的双轨迹分析

Emerg Adulthood. 2023 Apr;11(2):482-496. doi: 10.1177/21676968221128080. Epub 2022 Sep 20.

Childhood experiences and frailty trajectory among middle-aged and older adults in China.中国中老年人群的童年经历与衰弱轨迹

Eur J Ageing. 2022 Nov 24;19(4):1601-1615. doi: 10.1007/s10433-022-00746-7. eCollection 2022 Dec.

Profiles of perceived social functioning in adolescent and young adult survivors of childhood cancer.青少年和青年期癌症幸存者感知社会功能的特征。

Psychooncology. 2020 Aug;29(8):1288-1295. doi: 10.1002/pon.5417. Epub 2020 Jun 7.

Sleep Facilitates Coping: Moderated Mediation of Daily Sleep, Ethnic/Racial Discrimination, Stress Responses, and Adolescent Well-Being.睡眠有助于应对：日间睡眠、种族/民族歧视、应激反应与青少年健康的中介调节作用。

Child Dev. 2020 Jul;91(4):e833-e852. doi: 10.1111/cdev.13324. Epub 2019 Oct 28.

Early childhood predictors of global competence in adolescence for youth with typical development or intellectual disability.青少年期具有典型发育或智力障碍的青年全球胜任力的早期儿童期预测因素。

Res Dev Disabil. 2019 Nov;94:103462. doi: 10.1016/j.ridd.2019.103462. Epub 2019 Sep 6.

The efficacy of adding group behavioral activation to usual care in patients with fibromyalgia and major depression: design and protocol for a randomized clinical trial.在纤维肌痛和重度抑郁症患者中，在常规护理基础上增加团体行为激活的疗效：一项随机临床试验的设计与方案

Trials. 2018 Nov 29;19(1):660. doi: 10.1186/s13063-018-3037-1.

Randomized Control Trial of COMPASS for Improving Transition Outcomes of Students with Autism Spectrum Disorder.随机对照试验 COMPASS 改善自闭症谱系障碍学生过渡结果。

J Autism Dev Disord. 2018 Oct;48(10):3586-3595. doi: 10.1007/s10803-018-3623-9.

Allocation of Treatment Responsibility in Adolescents With Epilepsy: Associations With Cognitive Skills and Medication Adherence.青少年癫痫治疗责任分配：与认知技能和药物依从性的关系。

J Pediatr Psychol. 2019 Jan 1;44(1):72-83. doi: 10.1093/jpepsy/jsy006.

Normal Theory Two-Stage ML Estimator When Data Are Missing at the Item Level.当数据在项目层面缺失时的正态理论两阶段极大似然估计器。

J Educ Behav Stat. 2017 Aug;42(4):405-431. doi: 10.3102/1076998617694880. Epub 2017 Mar 9.

本文引用的文献

Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective.多元缺失数据问题的多重填补：数据分析师视角

Multivariate Behav Res. 1998 Oct 1;33(4):545-71. doi: 10.1207/s15327906mbr3304_5.

Consistency of Normal Distribution Based Pseudo Maximum Likelihood Estimates When Data Are Missing at Random.当数据随机缺失时基于正态分布的伪最大似然估计的一致性

Am Stat. 2010 Aug 1;64(3):263-267. doi: 10.1198/tast.2010.09203.

Multiple imputation methods for treatment noncompliance and nonresponse in randomized clinical trials.随机临床试验中治疗不依从和无反应的多重填补方法。

Biometrics. 2009 Mar;65(1):88-95. doi: 10.1111/j.1541-0420.2008.01023.x. Epub 2008 Apr 4.

Bootstrapping to test for nonzero population correlation coefficients using univariate sampling.使用单变量抽样进行自举检验以检测非零总体相关系数。

Psychol Methods. 2007 Dec;12(4):414-433. doi: 10.1037/1082-989X.12.4.414.

Out of sight, not out of mind: strategies for handling missing data.眼不见，心不念：处理缺失数据的策略。

Am J Health Behav. 2008 Jan-Feb;32(1):83-92. doi: 10.5555/ajhb.2008.32.1.83.

Multiple imputation: current perspectives.多重填补：当前观点

Stat Methods Med Res. 2007 Jun;16(3):199-218. doi: 10.1177/0962280206075304.

How many imputations are really needed? Some practical clarifications of multiple imputation theory.究竟需要多少次插补？多重插补理论的一些实际阐释。

Prev Sci. 2007 Sep;8(3):206-13. doi: 10.1007/s11121-007-0070-9. Epub 2007 Jun 5.

Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models.无事生非：缺失数据方法与拟合不完全数据回归模型软件的比较

Am Stat. 2007 Feb;61(1):79-90. doi: 10.1198/000313007X172556.

Multiple imputation: review of theory, implementation and software.多重填补：理论、实施与软件综述

Stat Med. 2007 Jul 20;26(16):3057-77. doi: 10.1002/sim.2787.

Methods for addressing missing data in psychiatric and developmental research.精神科与发育研究中处理缺失数据的方法。

J Am Acad Child Adolesc Psychiatry. 2005 Dec;44(12):1230-40. doi: 10.1097/01.chi.0000181044.06337.6f.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。