用于分析缺失数据的方法真的重要吗？对一项中级护理患者观察性研究的数据进行的考察。

Do the methods used to analyse missing data really matter? An examination of data from an observational study of Intermediate Care patients.

作者信息

Kaambwa Billingsley, Bryan Stirling, Billingham Lucinda

机构信息

Health Economics Unit, University of Birmingham, Edgbaston, Birmingham, United Kingdom.

出版信息

BMC Res Notes. 2012 Jun 27;5:330. doi: 10.1186/1756-0500-5-330.

DOI:10.1186/1756-0500-5-330

PMID:22738344

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3441253/

Abstract

BACKGROUND

Missing data is a common statistical problem in healthcare datasets from populations of older people. Some argue that arbitrarily assuming the mechanism responsible for the missingness and therefore the method for dealing with this missingness is not the best option-but is this always true? This paper explores what happens when extra information that suggests that a particular mechanism is responsible for missing data is disregarded and methods for dealing with the missing data are chosen arbitrarily. Regression models based on 2,533 intermediate care (IC) patients from the largest evaluation of IC done and published in the UK to date were used to explain variation in costs, EQ-5D and Barthel index. Three methods for dealing with missingness were utilised, each assuming a different mechanism as being responsible for the missing data: complete case analysis (assuming missing completely at random-MCAR), multiple imputation (assuming missing at random-MAR) and Heckman selection model (assuming missing not at random-MNAR). Differences in results were gauged by examining the signs of coefficients as well as the sizes of both coefficients and associated standard errors.

RESULTS

Extra information strongly suggested that missing cost data were MCAR. The results show that MCAR and MAR-based methods yielded similar results with sizes of most coefficients and standard errors differing by less than 3.4% while those based on MNAR-methods were statistically different (up to 730% bigger). Significant variables in all regression models also had the same direction of influence on costs. All three mechanisms of missingness were shown to be potential causes of the missing EQ-5D and Barthel data. The method chosen to deal with missing data did not seem to have any significant effect on the results for these data as they led to broadly similar conclusions with sizes of coefficients and standard errors differing by less than 54% and 322%, respectively.

CONCLUSIONS

Arbitrary selection of methods to deal with missing data should be avoided. Using extra information gathered during the data collection exercise about the cause of missingness to guide this selection would be more appropriate.

摘要

背景

在老年人健康护理数据集中，缺失数据是一个常见的统计问题。一些人认为，随意假定导致数据缺失的机制以及处理这种缺失数据的方法并非最佳选择——但情况总是如此吗？本文探讨了在忽略表明特定机制导致数据缺失的额外信息并随意选择处理缺失数据的方法时会发生什么。基于来自英国迄今为止已完成并发表的最大规模中间护理（IC）评估中的2533名IC患者的回归模型，用于解释成本、EQ-5D和巴氏指数的变化。采用了三种处理缺失数据的方法，每种方法假定一种不同的机制导致数据缺失：完整病例分析（假定完全随机缺失——MCAR）、多重填补（假定随机缺失——MAR）和赫克曼选择模型（假定非随机缺失——MNAR）。通过检查系数的符号以及系数和相关标准误差的大小来衡量结果差异。

结果

额外信息强烈表明，缺失的成本数据是MCAR。结果表明，基于MCAR和MAR的方法产生了相似的结果，大多数系数和标准误差的大小差异小于3.4%，而基于MNAR方法的结果在统计上存在差异（高达730%）。所有回归模型中的显著变量对成本的影响方向也相同。所有三种缺失机制都被证明是EQ-5D和巴氏数据缺失的潜在原因。选择处理缺失数据的方法似乎对这些数据的结果没有任何显著影响，因为它们得出的结论大致相似，系数和标准误差的大小差异分别小于54%和322%。

结论

应避免随意选择处理缺失数据的方法。利用在数据收集过程中收集到的关于缺失原因的额外信息来指导这一选择会更合适。

相似文献

Do the methods used to analyse missing data really matter? An examination of data from an observational study of Intermediate Care patients.

BMC Res Notes. 2012 Jun 27;5:330. doi: 10.1186/1756-0500-5-330.

Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study.

BMC Med Res Methodol. 2010 Jan 19;10:7. doi: 10.1186/1471-2288-10-7.

Score test for missing at random or not under logistic missingness models.

Biometrics. 2023 Jun;79(2):1268-1279. doi: 10.1111/biom.13666. Epub 2022 Apr 7.

How to deal with missing longitudinal data in cost of illness analysis in Alzheimer's disease-suggestions from the GERAS observational study.

BMC Med Res Methodol. 2016 Jul 18;16:83. doi: 10.1186/s12874-016-0188-1.

Heckman imputation models for binary or continuous MNAR outcomes and MAR predictors.

BMC Med Res Methodol. 2018 Aug 31;18(1):90. doi: 10.1186/s12874-018-0547-1.

A Realistic Evaluation of Methods for Handling Missing Data When There is a Mixture of MCAR, MAR, and MNAR Mechanisms in the Same Dataset.

Multivariate Behav Res. 2023 Sep-Oct;58(5):988-1013. doi: 10.1080/00273171.2022.2158776. Epub 2023 Jan 4.

Imputation of missing covariate in randomized controlled trials with a continuous outcome: Scoping review and new results.

Pharm Stat. 2020 Nov;19(6):840-860. doi: 10.1002/pst.2041. Epub 2020 Jun 8.

A Bayesian Latent Variable Selection Model for Nonignorable Missingness.

Multivariate Behav Res. 2022 Mar-May;57(2-3):478-512. doi: 10.1080/00273171.2021.1874259. Epub 2021 Feb 2.

A hybrid return to baseline imputation method to incorporate MAR and MNAR dropout missingness.

Contemp Clin Trials. 2022 Sep;120:106859. doi: 10.1016/j.cct.2022.106859. Epub 2022 Jul 21.

Missing data and multiple imputation in clinical epidemiological research.

Clin Epidemiol. 2017 Mar 15;9:157-166. doi: 10.2147/CLEP.S129785. eCollection 2017.

引用本文的文献

TiMEG: an integrative statistical method for partially missing multi-omics data.

Sci Rep. 2021 Dec 15;11(1):24077. doi: 10.1038/s41598-021-03034-z.

Feasibility of the EQ-5D in the elderly population: a systematic review of the literature.

Qual Life Res. 2022 Jun;31(6):1621-1637. doi: 10.1007/s11136-021-03007-9. Epub 2021 Oct 6.

Heckman imputation models for binary or continuous MNAR outcomes and MAR predictors.

BMC Med Res Methodol. 2018 Aug 31;18(1):90. doi: 10.1186/s12874-018-0547-1.

How to deal with missing longitudinal data in cost of illness analysis in Alzheimer's disease-suggestions from the GERAS observational study.

BMC Med Res Methodol. 2016 Jul 18;16:83. doi: 10.1186/s12874-016-0188-1.

Early ART Results in Greater Immune Reconstitution Benefits in HIV-Infected Infants: Working with Data Missingness in a Longitudinal Dataset.

PLoS One. 2015 Dec 15;10(12):e0145320. doi: 10.1371/journal.pone.0145320. eCollection 2015.

An evidence-based practice educational intervention for athletic trainers: a randomized controlled trial.

J Athl Train. 2014 Mar-Apr;49(2):210-9. doi: 10.4085/1062-6050-49.2.13. Epub 2014 Feb 25.

A primer on effectiveness and efficacy trials.

Clin Transl Gastroenterol. 2014 Jan 2;5(1):e45. doi: 10.1038/ctg.2013.13.

A primer on predictive models.

Clin Transl Gastroenterol. 2014 Jan 2;5(1):e44. doi: 10.1038/ctg.2013.19.

本文引用的文献

Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective.

Multivariate Behav Res. 1998 Oct 1;33(4):545-71. doi: 10.1207/s15327906mbr3304_5.

Mapping utility scores from the Barthel index.

Eur J Health Econ. 2013 Apr;14(2):231-41. doi: 10.1007/s10198-011-0364-5. Epub 2011 Nov 2.

Predicting missing quality of life data that were later recovered: an empirical comparison of approaches.

Clin Trials. 2010 Aug;7(4):333-42. doi: 10.1177/1740774510374626. Epub 2010 Jun 24.

A review of studies mapping (or cross walking) non-preference based measures of health to generic preference-based measures.

Eur J Health Econ. 2010 Apr;11(2):215-25. doi: 10.1007/s10198-009-0168-z. Epub 2009 Jul 8.

Investigating the missing data mechanism in quality of life outcomes: a comparison of approaches.

Health Qual Life Outcomes. 2009 Jun 22;7:57. doi: 10.1186/1477-7525-7-57.

Costs and health outcomes of intermediate care: results from five UK case study sites.

Health Soc Care Community. 2008 Dec;16(6):573-81. doi: 10.1111/j.1365-2524.2008.00780.x. Epub 2008 Apr 1.

A robust approach for skewed and heavy-tailed outcomes in the analysis of health care expenditures.

J Health Econ. 2006 Mar;25(2):198-213. doi: 10.1016/j.jhealeco.2005.04.010. Epub 2006 Jan 4.

How attrition impacts the internal and external validity of longitudinal research.

J Sch Health. 2005 Sep;75(7):267-70. doi: 10.1111/j.1746-1561.2005.00035.x.

Reliability of the Barthel Index when used with older people.

Age Ageing. 2005 May;34(3):228-32. doi: 10.1093/ageing/afi063.

A comparison of the EQ-5D and SF-6D across seven patient groups.

Health Econ. 2004 Sep;13(9):873-84. doi: 10.1002/hec.866.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于分析缺失数据的方法真的重要吗？对一项中级护理患者观察性研究的数据进行的考察。

Do the methods used to analyse missing data really matter? An examination of data from an observational study of Intermediate Care patients.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献