Liu Benmei, Yu Mandi, Graubard Barry I, Troiano Richard P, Schenker Nathaniel
Division of Cancer Control and Population Science, National Cancer Institute, Rockville, MD, U.S.A..
Division of Cancer Control and Population Science, National Cancer Institute, Rockville, MD, U.S.A.
Stat Med. 2016 Dec 10;35(28):5170-5188. doi: 10.1002/sim.7049. Epub 2016 Aug 2.
The Physical Activity Monitor component was introduced into the 2003-2004 National Health and Nutrition Examination Survey (NHANES) to collect objective information on physical activity including both movement intensity counts and ambulatory steps. Because of an error in the accelerometer device initialization process, the steps data were missing for all participants in several primary sampling units, typically a single county or group of contiguous counties, who had intensity count data from their accelerometers. To avoid potential bias and loss in efficiency in estimation and inference involving the steps data, we considered methods to accurately impute the missing values for steps collected in the 2003-2004 NHANES. The objective was to come up with an efficient imputation method that minimized model-based assumptions. We adopted a multiple imputation approach based on additive regression, bootstrapping and predictive mean matching methods. This method fits alternative conditional expectation (ace) models, which use an automated procedure to estimate optimal transformations for both the predictor and response variables. This paper describes the approaches used in this imputation and evaluates the methods by comparing the distributions of the original and the imputed data. A simulation study using the observed data is also conducted as part of the model diagnostics. Finally, some real data analyses are performed to compare the before and after imputation results. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.
身体活动监测组件被引入2003 - 2004年国家健康与营养检查调查(NHANES),以收集有关身体活动的客观信息,包括运动强度计数和步行步数。由于加速度计设备初始化过程中的一个错误,几个主要抽样单元(通常是单个县或一组相邻县)的所有参与者的步数数据缺失,而这些参与者从加速度计中获取了强度计数数据。为了避免在涉及步数数据的估计和推断中出现潜在偏差和效率损失,我们考虑了一些方法来准确插补2003 - 2004年NHANES中收集的步数缺失值。目标是提出一种有效的插补方法,将基于模型的假设最小化。我们采用了基于加法回归、自助法和预测均值匹配方法的多重插补方法。该方法拟合替代条件期望(ace)模型,该模型使用自动程序来估计预测变量和响应变量的最优变换。本文描述了这种插补所使用的方法,并通过比较原始数据和插补后数据的分布来评估这些方法。作为模型诊断的一部分,还使用观测数据进行了模拟研究。最后,进行了一些实际数据分析,以比较插补前后的结果。2016年发表。本文是美国政府作品,在美国属于公共领域。