Buhi Eric R, Goodson Patricia, Neilands Torsten B
Department of Community and Family Health, College of Public Health, University of South Florida, Tampa 33612, FL, USA.
Am J Health Behav. 2008 Jan-Feb;32(1):83-92. doi: 10.5555/ajhb.2008.32.1.83.
To describe and illustrate missing data mechanisms (MCAR, MAR, NMAR) and missing data techniques (MDTs) and offer recommended best practices for addressing missingness.
We simulated data sets and employed ad hoc MDTs (deletion techniques, mean substitution) and sophisticated MDTs (full information maximum likelihood, Bayesian estimation, multiple imputation) in linear regression analyses.
MCAR data yielded unbiased parameter estimates across all MDTs, but loss of power with deletion methods. NMAR results were biased towards larger values and greater significance. Under MAR the sophisticated MDTs returned estimates closer to their original values.
State-of-the-art, readily available MDTs outperform ad hoc techniques.
描述并阐释缺失数据机制(完全随机缺失、随机缺失、非随机缺失)和缺失数据技术,并提供处理数据缺失的推荐最佳实践方法。
我们模拟数据集,并在线性回归分析中采用了临时缺失数据技术(删除技术、均值替换)和复杂缺失数据技术(全信息极大似然估计、贝叶斯估计、多重填补)。
完全随机缺失数据在所有缺失数据技术中产生无偏参数估计,但删除方法会导致功效损失。非随机缺失结果偏向于更大的值且具有更高的显著性。在随机缺失情况下,复杂缺失数据技术返回的估计值更接近其原始值。
先进的、现成的缺失数据技术优于临时技术。