• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Multiple Imputation with Massive Data: An Application to the Panel Study of Income Dynamics.海量数据的多重插补:在收入动态面板研究中的应用
J Surv Stat Methodol. 2021 Oct 19;11(1):260-283. doi: 10.1093/jssam/smab038. eCollection 2023 Feb.
2
Implementing Multiple Imputation for Missing Data in Longitudinal Studies When Models are Not Feasible: An Example Using the Random Hot Deck Approach.当模型不可行时,在纵向研究中对缺失数据实施多重填补:使用随机热卡方法的一个示例。
Clin Epidemiol. 2022 Nov 15;14:1387-1403. doi: 10.2147/CLEP.S368303. eCollection 2022.
3
How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data.处理缺失数据如何影响结论:六种不同的分类问卷数据插补方法的比较
SAGE Open Med. 2019 Jan 8;7:2050312118822912. doi: 10.1177/2050312118822912. eCollection 2019.
4
The multiple imputation method: a case study involving secondary data analysis.多重填补法:一项涉及二次数据分析的案例研究。
Nurse Res. 2015 May;22(5):13-9. doi: 10.7748/nr.22.5.13.e1319.
5
Hot Deck Multiple Imputation for Handling Missing Accelerometer Data.用于处理缺失加速度计数据的热卡多重填补法
Stat Biosci. 2019 Jul;11(2):422-448. doi: 10.1007/s12561-018-9225-4. Epub 2018 Oct 29.
6
The rise of multiple imputation: a review of the reporting and implementation of the method in medical research.多重填补法的兴起:医学研究中该方法报告与实施情况的综述
BMC Med Res Methodol. 2015 Apr 7;15:30. doi: 10.1186/s12874-015-0022-1.
7
Multiple imputation for missing income data in population-based health surveillance.基于人群的健康监测中缺失收入数据的多重插补。
J Public Health Manag Pract. 2009 Nov-Dec;15(6):E12-21. doi: 10.1097/PHH.0b013e3181aab5f7.
8
Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework.多组学数据整合中缺失行的处理:多因素分析框架下的多重填补
BMC Bioinformatics. 2016 Oct 3;17(1):402. doi: 10.1186/s12859-016-1273-5.
9
The relationship between hot-deck multiple imputation and weighted likelihood.热插补多重填补与加权似然之间的关系。
Stat Med. 1997;16(1-3):5-19. doi: 10.1002/(sici)1097-0258(19970115)16:1<5::aid-sim469>3.0.co;2-8.
10
Multiple imputation of missing data with skip-pattern covariates: a comparison of alternative strategies.带有跳跃模式协变量的缺失数据多重填补:替代策略比较
J Stat Comput Simul. 2023;94(7):1543-1570. doi: 10.1080/00949655.2023.2293124.

引用本文的文献

1
Exploring Computational Data Amplification and Imputation for the Discovery of Type 1 Diabetes (T1D) Biomarkers from Limited Human Datasets.探索计算数据扩增和插补,以从有限的人类数据集发现 1 型糖尿病 (T1D) 生物标志物。
Biomolecules. 2022 Oct 9;12(10):1444. doi: 10.3390/biom12101444.
2
An empirical evaluation of alternative approaches to adjusting for attrition when analyzing longitudinal survey data on young adults' substance use trajectories.当分析关于年轻人物质使用轨迹的纵向调查数据时,对调整因流失而产生的偏差的替代方法进行实证评估。
Int J Methods Psychiatr Res. 2022 Sep;31(3):e1916. doi: 10.1002/mpr.1916. Epub 2022 May 18.

本文引用的文献

1
Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap.使用加权有限总体贝叶斯自助法对两阶段整群样本进行多重填补
J Surv Stat Methodol. 2016 Jun 1;4(2):139-170. doi: 10.1093/jssam/smv031. Epub 2016 Jan 31.
2
WEALTH INEQUALITY AND ACCUMULATION.财富不平等与积累
Annu Rev Sociol. 2017 Jul;43:379-404. doi: 10.1146/annurev-soc-060116-053331. Epub 2017 May 10.
3
Determinants of Wealth Fluctuation: Changes in Hard-To-Measure Economic Variables in a Panel Study.财富波动的决定因素:面板研究中难以衡量的经济变量的变化
Methoden Daten Anal. 2017;11(1):87-108. doi: 10.12758/mda.2016.015.
4
Measuring Wealth and Wealth Inequality: Comparing Two U.S. Surveys.衡量财富与财富不平等:比较两项美国调查。
J Econ Soc Meas. 2016;41(2):103-120. doi: 10.3233/JEM-160421. Epub 2016 Jun 21.
5
Sequential BART for imputation of missing covariates.用于插补缺失协变量的顺序BART
Biostatistics. 2016 Jul;17(3):589-602. doi: 10.1093/biostatistics/kxw009. Epub 2016 Mar 15.
6
Graphical and numerical diagnostic tools to assess suitability of multiple imputations and imputation models.用于评估多重填补及填补模型适用性的图形和数值诊断工具。
Stat Med. 2016 Jul 30;35(17):3007-20. doi: 10.1002/sim.6926. Epub 2016 Mar 7.
7
Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data.高维数据存在时一般缺失数据模式的多重填补
Sci Rep. 2016 Feb 12;6:21689. doi: 10.1038/srep21689.
8
Evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data.纵向电子健康记录数据的双重完全条件指定多重填补法评估
Stat Med. 2014 Sep 20;33(21):3725-37. doi: 10.1002/sim.6184. Epub 2014 Apr 30.
9
MissForest--non-parametric missing value imputation for mixed-type data.MissForest--用于混合类型数据的非参数缺失值插补。
Bioinformatics. 2012 Jan 1;28(1):112-8. doi: 10.1093/bioinformatics/btr597. Epub 2011 Oct 28.
10
A Review of Hot Deck Imputation for Survey Non-response.调查无应答的热卡填充法综述
Int Stat Rev. 2010 Apr;78(1):40-64. doi: 10.1111/j.1751-5823.2010.00103.x.

海量数据的多重插补:在收入动态面板研究中的应用

Multiple Imputation with Massive Data: An Application to the Panel Study of Income Dynamics.

作者信息

Si Yajuan, Heeringa Steve, Johnson David, Little Roderick J A, Liu Wenshuo, Pfeffer Fabian, Raghunathan Trivellore

机构信息

Research Assistant Professor, Survey Research Center, Institute for Social Research, University of Michigan, 426 Thompson St., Ann Arbor, MI 48104, USA.

Senior Research Scientist, Survey Research Center, Institute for Social Research, University of Michigan, 426 Thompson St., Ann Arbor, MI 48104, USA.

出版信息

J Surv Stat Methodol. 2021 Oct 19;11(1):260-283. doi: 10.1093/jssam/smab038. eCollection 2023 Feb.

DOI:10.1093/jssam/smab038
PMID:36714298
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9874997/
Abstract

Multiple imputation (MI) is a popular and well-established method for handling missing data in multivariate data sets, but its practicality for use in massive and complex data sets has been questioned. One such data set is the Panel Study of Income Dynamics (PSID), a longstanding and extensive survey of household income and wealth in the United States. Missing data for this survey are currently handled using traditional hot deck methods because of the simple implementation; however, the univariate hot deck results in large random wealth fluctuations. MI is effective but faced with operational challenges. We use a sequential regression/chained-equation approach, using the software IVEware, to multiply impute cross-sectional wealth data in the 2013 PSID, and compare analyses of the resulting imputed data with those from the current hot deck approach. Practical difficulties, such as non-normally distributed variables, skip patterns, categorical variables with many levels, and multicollinearity, are described together with our approaches to overcoming them. We evaluate the imputation quality and validity with internal diagnostics and external benchmarking data. MI produces improvements over the existing hot deck approach by helping preserve correlation structures, such as the associations between PSID wealth components and the relationships between the household net worth and sociodemographic factors, and facilitates completed data analyses with general purposes. MI incorporates highly predictive covariates into imputation models and increases efficiency. We recommend the practical implementation of MI and expect greater gains when the fraction of missing information is large.

摘要

多重填补(MI)是一种在多变量数据集中处理缺失数据的常用且成熟的方法,但其在大规模复杂数据集中的实用性受到了质疑。收入动态面板研究(PSID)就是这样一个数据集,它是美国一项关于家庭收入和财富的长期且广泛的调查。由于实施简单,该调查目前使用传统的热卡方法处理缺失数据;然而,单变量热卡会导致财富出现大幅随机波动。多重填补有效但面临操作挑战。我们使用顺序回归/链式方程方法,借助IVEware软件,对2013年PSID中的横截面财富数据进行多重填补,并将所得填补数据的分析结果与当前热卡方法的分析结果进行比较。文中描述了诸如变量非正态分布、跳答模式、具有多个层次的分类变量以及多重共线性等实际困难,以及我们克服这些困难的方法。我们通过内部诊断和外部基准数据评估填补质量和有效性。多重填补通过帮助保留相关结构(如PSID财富组成部分之间的关联以及家庭净资产与社会人口因素之间的关系),对现有的热卡方法做出了改进,并便于进行通用的完整数据分析。多重填补将高度预测性的协变量纳入填补模型并提高了效率。我们建议实际应用多重填补,并且预计当缺失信息比例较大时会有更大的收益。