• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于高斯分布的方法来推断健康调查中的分类变量。

Gaussian-based routines to impute categorical variables in health surveys.

机构信息

Department of Epidemiology and Biostatistics, School of Public Health, University at Albany, SUNY, One University Place, Rensselaer, NY 12144-3456, USA.

出版信息

Stat Med. 2011 Dec 20;30(29):3447-60. doi: 10.1002/sim.4355. Epub 2011 Oct 4.

DOI:10.1002/sim.4355
PMID:21976366
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3263356/
Abstract

The multivariate normal (MVN) distribution is arguably the most popular parametric model used in imputation and is available in most software packages (e.g., SAS PROC MI, R package norm). When it is applied to categorical variables as an approximation, practitioners often either apply simple rounding techniques for ordinal variables or create a distinct 'missing' category and/or disregard the nominal variable from the imputation phase. All of these practices can potentially lead to biased and/or uninterpretable inferences. In this work, we develop a new rounding methodology calibrated to preserve observed distributions to multiply impute missing categorical covariates. The major attractiveness of this method is its flexibility to use any 'working' imputation software, particularly those based on MVN, allowing practitioners to obtain usable imputations with small biases. A simulation study demonstrates the clear advantage of the proposed method in rounding ordinal variables and, in some scenarios, its plausibility in imputing nominal variables. We illustrate our methods on a widely used National Survey of Children with Special Health Care Needs where incomplete values on race posed a valid threat on inferences pertaining to disparities.

摘要

多变量正态(MVN)分布可以说是在插补中使用最广泛的参数模型,并且大多数软件包(例如 SAS PROC MI、R 包 norm)都提供了该模型。当将其应用于分类变量作为近似值时,从业者通常要么对有序变量应用简单的舍入技术,要么创建一个独特的“缺失”类别,并/或在插补阶段忽略名义变量。所有这些做法都可能导致有偏差和/或不可解释的推断。在这项工作中,我们开发了一种新的舍入方法,该方法经过校准,可以保留观察到的分布,以便对缺失的分类协变量进行多重插补。这种方法的主要吸引力在于其灵活性,可以使用任何“工作”的插补软件,特别是基于 MVN 的软件,从而允许从业者以较小的偏差获得可用的插补值。一项模拟研究表明,该方法在舍入有序变量方面具有明显的优势,并且在某些情况下,在对名义变量进行插补方面也具有合理性。我们在广泛使用的具有特殊健康需求的儿童全国调查中说明了我们的方法,其中种族的不完整值对与差异相关的推断构成了合理的威胁。

相似文献

1
Gaussian-based routines to impute categorical variables in health surveys.基于高斯分布的方法来推断健康调查中的分类变量。
Stat Med. 2011 Dec 20;30(29):3447-60. doi: 10.1002/sim.4355. Epub 2011 Oct 4.
2
Comparison of methods for imputing ordinal data using multivariate normal imputation: a case study of non-linear effects in a large cohort study.使用多元正态插补法对有序数据进行插补方法的比较:一项大型队列研究中非线性效应的案例研究。
Stat Med. 2012 Dec 30;31(30):4164-74. doi: 10.1002/sim.5445. Epub 2012 Jul 24.
3
A comparison of multiple imputation methods for missing data in longitudinal studies.纵向研究中缺失数据的多种插补方法比较。
BMC Med Res Methodol. 2018 Dec 12;18(1):168. doi: 10.1186/s12874-018-0615-6.
4
Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study.多种插补方法处理具有时间过渡限制的纵向分类变量中的缺失值:一项模拟研究。
BMC Med Res Methodol. 2019 Jan 10;19(1):14. doi: 10.1186/s12874-018-0653-0.
5
Comparison of methods for imputing limited-range variables: a simulation study.有限范围变量插补方法的比较:一项模拟研究。
BMC Med Res Methodol. 2014 Apr 26;14:57. doi: 10.1186/1471-2288-14-57.
6
A Comparison of Imputation Strategies for Ordinal Missing Data on Likert Scale Variables.李克特量表变量中有序缺失数据的插补策略比较
Multivariate Behav Res. 2015;50(5):484-503. doi: 10.1080/00273171.2015.1022644. Epub 2015 Jul 24.
7
Robustness of a multivariate normal approximation for imputation of incomplete binary data.用于不完全二元数据插补的多元正态近似的稳健性。
Stat Med. 2007 Mar 15;26(6):1368-82. doi: 10.1002/sim.2619.
8
Imputation strategies when a continuous outcome is to be dichotomized for responder analysis: a simulation study.当连续结果需要二分类化进行应答者分析时的推断策略:一项模拟研究。
BMC Med Res Methodol. 2019 Jul 23;19(1):161. doi: 10.1186/s12874-019-0793-x.
9
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
10
Multiple imputation in the presence of non-normal data.非正态数据情况下的多重填补
Stat Med. 2017 Feb 20;36(4):606-617. doi: 10.1002/sim.7173. Epub 2016 Nov 15.

引用本文的文献

1
A computationally efficient sequential regression imputation algorithm for multilevel data.一种用于多级数据的计算高效的序贯回归插补算法。
J Appl Stat. 2023 Nov 6;51(11):2258-2278. doi: 10.1080/02664763.2023.2277669. eCollection 2024.
2
ROBUST INFERENCE WHEN COMBINING INVERSE-PROBABILITY WEIGHTING AND MULTIPLE IMPUTATION TO ADDRESS MISSING DATA WITH APPLICATION TO AN ELECTRONIC HEALTH RECORDS-BASED STUDY OF BARIATRIC SURGERY.在结合逆概率加权和多重填补以处理缺失数据并应用于基于电子健康记录的减肥手术研究时的稳健推断
Ann Appl Stat. 2021 Mar;15(1):126-147. doi: 10.1214/20-aoas1386.
3
Repeated measures discriminant analysis using multivariate generalized estimation equations.

本文引用的文献

1
Rounding strategies for multiply imputed binary data.多重填补二元数据的舍入策略。
Biom J. 2009 Aug;51(4):677-88. doi: 10.1002/bimj.200900018.
2
Medicaid managed care and the unmet need for mental health care among children with special health care needs.医疗补助管理式医疗与有特殊医疗需求儿童未得到满足的心理健康护理需求
Health Serv Res. 2008 Jun;43(3):882-900. doi: 10.1111/j.1475-6773.2007.00811.x.
3
Unmet need among children with special health care needs in Massachusetts.马萨诸塞州有特殊医疗需求儿童的未满足需求。
重复测量判别分析使用多变量广义估计方程。
Stat Methods Med Res. 2022 Apr;31(4):646-657. doi: 10.1177/09622802211032705. Epub 2021 Dec 13.
4
Three Sample Estimates of Fraction of Missing Information From Full Information Maximum Likelihood.基于完全信息极大似然法的缺失信息比例的三个样本估计值。
Front Psychol. 2021 Aug 26;12:667802. doi: 10.3389/fpsyg.2021.667802. eCollection 2021.
5
Evaluation of approaches for multiple imputation of three-level data.三水平数据的多重插补方法评价。
BMC Med Res Methodol. 2020 Aug 12;20(1):207. doi: 10.1186/s12874-020-01079-8.
6
ICARUS: Minimizing Human Effort in Iterative Data Completion.伊卡洛斯:在迭代数据补全中最小化人力投入。
Proceedings VLDB Endowment. 2018 Sep;11(13):2263-2276.
7
Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study.多种插补方法处理具有时间过渡限制的纵向分类变量中的缺失值:一项模拟研究。
BMC Med Res Methodol. 2019 Jan 10;19(1):14. doi: 10.1186/s12874-018-0653-0.
8
Model checking in multiple imputation: an overview and case study.多重填补中的模型检验:综述与案例研究
Emerg Themes Epidemiol. 2017 Aug 23;14:8. doi: 10.1186/s12982-017-0062-6. eCollection 2017.
9
A Colorimetric and Fluorescent Chemosensor for the Selective Detection of Cu and Zn Ions.一种用于选择性检测铜离子和锌离子的比色和荧光化学传感器。
J Fluoresc. 2017 Jan;27(1):357-367. doi: 10.1007/s10895-016-1964-3. Epub 2016 Oct 28.
10
Disclosure control using partially synthetic data for large-scale health surveys, with applications to CanCORS.使用部分合成数据进行大规模健康调查的披露控制及其在癌症队列研究中的应用
Stat Med. 2013 Oct 30;32(24):4139-61. doi: 10.1002/sim.5841. Epub 2013 May 13.
Matern Child Health J. 2008 Sep;12(5):650-61. doi: 10.1007/s10995-007-0283-3. Epub 2007 Sep 25.
4
Robustness of a multivariate normal approximation for imputation of incomplete binary data.用于不完全二元数据插补的多元正态近似的稳健性。
Stat Med. 2007 Mar 15;26(6):1368-82. doi: 10.1002/sim.2619.
5
Multiple imputation for model checking: completed-data plots with missing and latent data.用于模型检验的多重填补:带有缺失数据和潜在数据的完整数据图
Biometrics. 2005 Mar;61(1):74-85. doi: 10.1111/j.0006-341X.2005.031010.x.
6
A comparison of inclusive and restrictive strategies in modern missing data procedures.现代缺失数据处理中包容性策略与限制性策略的比较。
Psychol Methods. 2001 Dec;6(4):330-51.
7
A new definition of children with special health care needs.有特殊医疗保健需求儿童的新定义。
Pediatrics. 1998 Jul;102(1 Pt 1):137-40. doi: 10.1542/peds.102.1.137.