• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用期望最大化(EM)算法解决缺失数据问题的模型选择标准。

Model Selection Criteria for Missing-Data Problems Using the EM Algorithm.

作者信息

Ibrahim Joseph G, Zhu Hongtu, Tang Niansheng

机构信息

Joseph G. Ibrahim is Alumni Distinguished Professor (E-mail:

出版信息

J Am Stat Assoc. 2008 Dec 1;103(484):1648-1658. doi: 10.1198/016214508000001057.

DOI:10.1198/016214508000001057
PMID:19693282
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2728244/
Abstract

We consider novel methods for the computation of model selection criteria in missing-data problems based on the output of the EM algorithm. The methodology is very general and can be applied to numerous situations involving incomplete data within an EM framework, from covariates missing at random in arbitrary regression models to nonignorably missing longitudinal responses and/or covariates. Toward this goal, we develop a class of information criteria for missing-data problems, called IC(H) (,) (Q), which yields the Akaike information criterion and the Bayesian information criterion as special cases. The computation of IC(H) (,) (Q) requires an analytic approximation to a complicated function, called the H-function, along with output from the EM algorithm used in obtaining maximum likelihood estimates. The approximation to the H-function leads to a large class of information criteria, called IC(H̃) (() (k) (),) (Q). Theoretical properties of IC(H̃) (() (k) (),) (Q), including consistency, are investigated in detail. To eliminate the analytic approximation to the H-function, a computationally simpler approximation to IC(H) (,) (Q), called IC(Q), is proposed, the computation of which depends solely on the Q-function of the EM algorithm. Advantages and disadvantages of IC(H̃) (() (k) (),) (Q) and IC(Q) are discussed and examined in detail in the context of missing-data problems. Extensive simulations are given to demonstrate the methodology and examine the small-sample and large-sample performance of IC(H̃) (() (k) (),) (Q) and IC(Q) in missing-data problems. An AIDS data set also is presented to illustrate the proposed methodology.

摘要

我们考虑基于期望最大化(EM)算法的输出,针对缺失数据问题计算模型选择标准的新方法。该方法非常通用,可应用于EM框架内涉及不完整数据的众多情况,从任意回归模型中随机缺失的协变量,到纵向响应和/或协变量存在不可忽视缺失的情况。为实现这一目标,我们针对缺失数据问题开发了一类信息准则,称为IC(H)(,)(Q),它在特殊情况下可得出赤池信息准则和贝叶斯信息准则。计算IC(H)(,)(Q)需要对一个称为H函数的复杂函数进行解析近似,同时还需要EM算法用于获得最大似然估计的输出。对H函数的近似导致了一大类信息准则,称为IC(H̃)(()(k)(),)(Q)。我们详细研究了IC(H̃)(()(k)(),)(Q)的理论性质,包括一致性。为消除对H函数的解析近似,我们提出了一种计算上更简单的对IC(H)(,)(Q)的近似,称为IC(Q),其计算仅依赖于EM算法的Q函数。我们在缺失数据问题的背景下详细讨论并检验了IC(H̃)(()(k)(),)(Q)和IC(Q)的优缺点。给出了大量模拟以展示该方法,并检验IC(H̃)(()(k)(),)(Q)和IC(Q)在缺失数据问题中的小样本和大样本性能。还给出了一个艾滋病数据集以说明所提出的方法。

相似文献

1
Model Selection Criteria for Missing-Data Problems Using the EM Algorithm.使用期望最大化(EM)算法解决缺失数据问题的模型选择标准。
J Am Stat Assoc. 2008 Dec 1;103(484):1648-1658. doi: 10.1198/016214508000001057.
2
VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA.针对存在缺失数据的回归模型的变量选择
Stat Sin. 2010 Jan;20(1):149-165.
3
Bayesian analysis for generalized linear models with nonignorably missing covariates.具有不可忽略缺失协变量的广义线性模型的贝叶斯分析。
Biometrics. 2005 Sep;61(3):767-80. doi: 10.1111/j.1541-0420.2005.00338.x.
4
Variable selection in the cox regression model with covariates missing at random.协变量随机缺失情况下Cox回归模型中的变量选择
Biometrics. 2010 Mar;66(1):97-104. doi: 10.1111/j.1541-0420.2009.01274.x. Epub 2009 May 18.
5
Likelihood methods for incomplete longitudinal binary responses with incomplete categorical covariates.针对具有不完全分类协变量的不完全纵向二元反应的似然方法。
Biometrics. 1999 Mar;55(1):214-23. doi: 10.1111/j.0006-341x.1999.00214.x.
6
Fixed and random effects selection in mixed effects models.混合效应模型中的固定效应和随机效应选择
Biometrics. 2011 Jun;67(2):495-503. doi: 10.1111/j.1541-0420.2010.01463.x. Epub 2010 Jul 21.
7
Selecting the model for multiple imputation of missing data: Just use an IC!选择缺失数据多重插补模型:只用信息准则(IC)!
Stat Med. 2021 May 10;40(10):2467-2497. doi: 10.1002/sim.8915. Epub 2021 Feb 24.
8
Empirical-likelihood-based criteria for model selection on marginal analysis of longitudinal data with dropout missingness.基于经验似然的标准,用于对具有缺失值的纵向数据进行边际分析时的模型选择。
Biometrics. 2019 Sep;75(3):950-965. doi: 10.1111/biom.13060. Epub 2019 Apr 25.
9
Theory and Inference for Regression Models with Missing Responses and Covariates.具有缺失响应和协变量的回归模型的理论与推断
J Multivar Anal. 2008 Jul;99(6):1302-1331. doi: 10.1016/j.jmva.2007.08.009.
10
Monte Carlo EM for missing covariates in parametric regression models.参数回归模型中缺失协变量的蒙特卡罗期望最大化算法
Biometrics. 1999 Jun;55(2):591-6. doi: 10.1111/j.0006-341x.1999.00591.x.

引用本文的文献

1
Bayesian semiparametric inference in longitudinal metabolomics data.纵向代谢组学数据中的贝叶斯半参数推断
Sci Rep. 2024 Dec 28;14(1):31336. doi: 10.1038/s41598-024-82718-8.
2
Envelope method with ignorable missing data.带有可忽略缺失数据的包络法。
Electron J Stat. 2021;15(2):4420-4461. doi: 10.1214/21-ejs1881. Epub 2021 Sep 14.
3
PENALIZED REGRESSION FOR MULTIPLE TYPES OF MANY FEATURES WITH MISSING DATA.针对具有缺失数据的多种大量特征的惩罚回归
Stat Sin. 2023 Apr;33(2):633-662. doi: 10.5705/ss.202020.0401.
4
Multidimensional variability in ecological assessments predicts two clusters of suicidal patients.生态评估的多维变异性预测出有自杀倾向的患者存在两个聚类。
Sci Rep. 2023 Mar 2;13(1):3546. doi: 10.1038/s41598-023-30085-1.
5
Discovery of Intentional Self-Harm Patterns from Suicide and Self-Harm Surveillance Reports.从自杀和自我伤害监测报告中发现故意自我伤害模式。
Healthc Inform Res. 2022 Oct;28(4):319-331. doi: 10.4258/hir.2022.28.4.319. Epub 2022 Oct 31.
6
Modelling the impact of antimicrobial use and external introductions on commensal E. coli colistin resistance in small-scale chicken farms of the Mekong delta of Vietnam.建模分析越南湄公河三角洲小型养鸡场中抗菌药物使用和外部引入对共生大肠杆菌耐药性的影响
Transbound Emerg Dis. 2022 Sep;69(5):e2185-e2194. doi: 10.1111/tbed.14558. Epub 2022 May 5.
7
Estimating the AUC with a Graphical Lasso Method for High-dimensional Biomarkers with LOD.使用图形套索法估计具有检测限的高维生物标志物的AUC。
Biostat Epidemiol. 2021;5(2):189-206. doi: 10.1080/24709360.2021.1898731. Epub 2021 Mar 17.
8
On the Treatment of Missing Item Responses in Educational Large-Scale Assessment Data: An Illustrative Simulation Study and a Case Study Using PISA 2018 Mathematics Data.教育大规模评估数据中缺失项目反应的处理:一项说明性模拟研究及使用2018年国际学生评估项目(PISA)数学数据的案例研究
Eur J Investig Health Psychol Educ. 2021 Dec 14;11(4):1653-1687. doi: 10.3390/ejihpe11040117.
9
A Dynamic Model for Imputing Missing Medical Data: A Multiobjective Particle Swarm Optimization Algorithm.用于推断缺失医学数据的动态模型:一种多目标粒子群优化算法。
J Healthc Eng. 2021 Oct 8;2021:1203726. doi: 10.1155/2021/1203726. eCollection 2021.
10
Selecting the model for multiple imputation of missing data: Just use an IC!选择缺失数据多重插补模型:只用信息准则(IC)!
Stat Med. 2021 May 10;40(10):2467-2497. doi: 10.1002/sim.8915. Epub 2021 Feb 24.

本文引用的文献

1
Bayesian analysis for generalized linear models with nonignorably missing covariates.具有不可忽略缺失协变量的广义线性模型的贝叶斯分析。
Biometrics. 2005 Sep;61(3):767-80. doi: 10.1111/j.1541-0420.2005.00338.x.
2
A local influence approach applied to binary data from a psychiatric study.一种应用于精神病学研究二元数据的局部影响方法。
Biometrics. 2003 Jun;59(2):410-9. doi: 10.1111/1541-0420.00048.
3
Monte Carlo EM for missing covariates in parametric regression models.参数回归模型中缺失协变量的蒙特卡罗期望最大化算法
Biometrics. 1999 Jun;55(2):591-6. doi: 10.1111/j.0006-341x.1999.00591.x.
4
Sensitivity analysis for nonrandom dropout: a local influence approach.非随机失访的敏感性分析:一种局部影响方法。
Biometrics. 2001 Mar;57(1):7-14. doi: 10.1111/j.0006-341x.2001.00007.x.
5
The effects of establishment practices, knowledge and attitudes on condom use among Filipina sex workers.菲律宾性工作者的从业习惯、知识及态度对避孕套使用的影响。
AIDS Care. 1998 Apr;10(2):213-20. doi: 10.1080/09540129850124460.
6
Parameter estimation from incomplete data in binomial regression when the missing data mechanism is nonignorable.当缺失数据机制不可忽略时,二项回归中不完全数据的参数估计。
Biometrics. 1996 Sep;52(3):1071-8.