缺失数据的比例不应用于指导多重插补的决策。

The proportion of missing data should not be used to guide decisions on multiple imputation.

机构信息

Population Health Sciences, Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, UK.

Population Health Sciences, Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, UK; MRC Integrative Epidemiology Unit, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, UK.

出版信息

J Clin Epidemiol. 2019 Jun;110:63-73. doi: 10.1016/j.jclinepi.2019.02.016. Epub 2019 Mar 13.

DOI:10.1016/j.jclinepi.2019.02.016

PMID:30878639

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6547017/

Abstract

OBJECTIVES

Researchers are concerned whether multiple imputation (MI) or complete case analysis should be used when a large proportion of data are missing. We aimed to provide guidance for drawing conclusions from data with a large proportion of missingness.

STUDY DESIGN AND SETTING

Via simulations, we investigated how the proportion of missing data, the fraction of missing information (FMI), and availability of auxiliary variables affected MI performance. Outcome data were missing completely at random or missing at random (MAR).

RESULTS

Provided sufficient auxiliary information was available; MI was beneficial in terms of bias and never detrimental in terms of efficiency. Models with similar FMI values, but differing proportions of missing data, also had similar precision for effect estimates. In the absence of bias, the FMI was a better guide to the efficiency gains using MI than the proportion of missing data.

CONCLUSION

We provide evidence that for MAR data, valid MI reduces bias even when the proportion of missingness is large. We advise researchers to use FMI to guide choice of auxiliary variables for efficiency gain in imputation analyses, and that sensitivity analyses including different imputation models may be needed if the number of complete cases is small.

摘要

目的

当大量数据缺失时，研究人员关注应使用多重插补（MI）还是完全案例分析。本研究旨在为从大量缺失数据中得出结论提供指导。

研究设计和设置

通过模拟，我们研究了缺失数据的比例、缺失信息量（FMI）和辅助变量的可用性如何影响 MI 的性能。结局数据完全随机缺失或随机缺失（MAR）。

结果

只要有足够的辅助信息可用；MI 在偏差方面是有益的，在效率方面从未有害。具有相似 FMI 值但缺失数据比例不同的模型，其效应估计的精度也相似。在不存在偏差的情况下，FMI 比缺失数据的比例更能指导使用 MI 获得效率增益。

结论

我们提供的证据表明，对于 MAR 数据，有效的 MI 即使在缺失率较大的情况下也能减少偏差。我们建议研究人员使用 FMI 来指导辅助变量的选择，以提高插补分析的效率增益，并且如果完整案例数较少，则可能需要包括不同插补模型的敏感性分析。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

缺失数据的比例不应用于指导多重插补的决策。

The proportion of missing data should not be used to guide decisions on multiple imputation.

机构信息

出版信息

OBJECTIVES

STUDY DESIGN AND SETTING

RESULTS

CONCLUSION

目的

研究设计和设置

结果

结论

相似文献

引用本文的文献

本文引用的文献

缺失数据的比例不应用于指导多重插补的决策。

The proportion of missing data should not be used to guide decisions on multiple imputation.

机构信息

出版信息

OBJECTIVES

STUDY DESIGN AND SETTING

RESULTS

CONCLUSION

目的

研究设计和设置

结果

结论

相似文献

引用本文的文献

本文引用的文献