Parr Christine L, Hjartåker Anette, Scheel Ida, Lund Eiliv, Laake Petter, Veierød Marit B
Institute of Basic Medical Sciences, Department of Biostatistics, University of Oslo, PO Box 1122 Blindern, N-0317 Oslo, Norway.
Public Health Nutr. 2008 Apr;11(4):361-70. doi: 10.1017/S1368980007000365. Epub 2007 Jul 2.
To investigate item non-response in a postal food-frequency questionnaire (FFQ), and to assess the effect of substituting/imputing missing values on dietary intake levels in the Norwegian Women and Cancer study (NOWAC). We have adapted and probably for the first time applied k nearest neighbours (KNN) imputation to FFQ data.
Data from a recent reproducibility study were used. The FFQ was mailed twice (test-retest) about 3 months apart to the same subjects. Missing responses in the test FFQ were imputed using the null value (frequencies = null, amount = smallest), the sample mode, the sample median, KNN, and retest values.
A methodological substudy of NOWAC, a national population-based cohort.
A random sample of 2000 women aged 46-75 years was drawn from the cohort in 2002 (response 75%). The imputation methods were compared for 1430 women who completed at least 50% of the test FFQ.
We imputed 16% missing values in the overall test data matrix. Compared to null value imputation, the largest differences in estimated dietary intake were seen for KNN, and for food items with a high proportion of missing. Imputation with retest values increased total energy intake, indicating that not all missing values are caused by respondents failing to specify no consumption, and that null value imputation may lead to underestimation and misclassification.
Missing values in FFQs present a methodological challenge. We encourage the application and evaluation of newer imputation methods, including KNN, which may reduce imputation errors and give more accurate intake estimates.
在挪威妇女与癌症研究(NOWAC)中,调查邮寄式食物频率问卷(FFQ)中的项目无应答情况,并评估替代/插补缺失值对饮食摄入量水平的影响。我们对FFQ数据采用了k最近邻法(KNN)插补法,这可能是首次应用。
使用近期重复性研究的数据。将FFQ分两次(重测)邮寄给同一受试者,间隔约3个月。测试FFQ中的缺失应答采用空值(频率=空值,数量=最小值)、样本众数、样本中位数、KNN和重测值进行插补。
NOWAC的一项方法学子研究,这是一项基于全国人群的队列研究。
2002年从该队列中随机抽取2000名年龄在46 - 75岁的女性(应答率75%)。对至少完成50%测试FFQ的1430名女性的插补方法进行比较。
我们对总体测试数据矩阵中的16%缺失值进行了插补。与空值插补相比,KNN以及缺失比例高的食物项目在估计饮食摄入量方面差异最大。用重测值插补增加了总能量摄入量,这表明并非所有缺失值都是由于受访者未明确表示未食用造成的,空值插补可能导致低估和错误分类。
FFQ中的缺失值带来了方法学挑战。我们鼓励应用和评估更新的插补方法,包括KNN,这可能会减少插补误差并给出更准确的摄入量估计。