Suppr超能文献

缺失数据与插补:腰痛预后研究中的实际例证

Missing data and imputation: a practical illustration in a prognostic study on low back pain.

作者信息

Vergouw David, Heymans Martijn W, van der Windt Daniëlle A W M, Foster Nadine E, Dunn Kate M, van der Horst Henriette E, de Vet Henrica C W

机构信息

EMGO+ Institute for Research in Extramural Medicine, Department of Methodology and Applied Biostatistics, VU University Medical Centre, Amsterdam, The Netherlands.

出版信息

J Manipulative Physiol Ther. 2012 Jul;35(6):464-71. doi: 10.1016/j.jmpt.2012.07.002.

Abstract

OBJECTIVE

When designing prediction models by complete case analysis (CCA), missing information in either baseline (predictors) or outcomes may lead to biased results. Multiple imputation (MI) has been shown to be suitable for obtaining unbiased results. This study provides researchers with an empirical illustration of the use of MI in a data set on low back pain, by comparing MI with the more commonly used CCA. Effects will be shown of imputing missing information on the composition and performance of prognostic models, distinguishing imputation of missing values in baseline characteristics and outcome data.

METHODS

Data came from the Beliefs about Backpain cohort, a study of psychologic obstacles to recovery in primary care back pain patients in the United Kingdom. Candidate predictors included demographics, back pain characteristics, and psychologic variables. Complete case analysis was compared with MI within patients with complete outcome but missing baseline data (n=809) and patients with missing baseline or outcome data (n=1591). Multiple imputation was performed by a Multiple Imputation by Chained Equations procedure.

RESULTS

Cases with missing outcome data (n=782, 49.1%) or with missing baseline data (n=116, 8%) both differed from complete cases regarding the distribution of some predictors and more often had a poor outcome. When comparing CCA with MI, model composition showed to be affected.

CONCLUSIONS

Complete case analysis can give biased results, even when only small amounts of data are missing. Now that MI is available in standard statistical software, we recommend that it be used to handle missing data.

摘要

目的

在通过完整病例分析(CCA)设计预测模型时,基线(预测变量)或结局中的信息缺失可能会导致有偏差的结果。多重填补(MI)已被证明适用于获得无偏结果。本研究通过将多重填补与更常用的完整病例分析进行比较,为研究人员提供了在一个腰痛数据集上使用多重填补的实证说明。将展示填补缺失信息对预后模型的构成和性能的影响,区分基线特征和结局数据中缺失值的填补情况。

方法

数据来自关于背痛的信念队列研究,该研究针对英国初级保健背痛患者康复的心理障碍。候选预测变量包括人口统计学、背痛特征和心理变量。在结局完整但基线数据缺失的患者(n = 809)以及基线或结局数据缺失的患者(n = 1591)中,将完整病例分析与多重填补进行比较。通过链式方程多重填补程序进行多重填补。

结果

结局数据缺失的病例(n = 782,49.1%)或基线数据缺失的病例(n = 116,8%)在某些预测变量的分布方面与完整病例均有所不同,且结局较差的情况更为常见。当将完整病例分析与多重填补进行比较时,模型构成显示受到了影响。

结论

即使仅缺失少量数据,完整病例分析也可能给出有偏差的结果。鉴于标准统计软件中已有多重填补功能,我们建议使用它来处理缺失数据。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验