Suppr超能文献

比较食物频率问卷中处理缺失值的方法并提出k近邻插补法:对挪威妇女与癌症研究(NOWAC)中饮食摄入量的影响

Comparing methods for handling missing values in food-frequency questionnaires and proposing k nearest neighbours imputation: effects on dietary intake in the Norwegian Women and Cancer study (NOWAC).

作者信息

Parr Christine L, Hjartåker Anette, Scheel Ida, Lund Eiliv, Laake Petter, Veierød Marit B

机构信息

Institute of Basic Medical Sciences, Department of Biostatistics, University of Oslo, PO Box 1122 Blindern, N-0317 Oslo, Norway.

出版信息

Public Health Nutr. 2008 Apr;11(4):361-70. doi: 10.1017/S1368980007000365. Epub 2007 Jul 2.

Abstract

OBJECTIVE

To investigate item non-response in a postal food-frequency questionnaire (FFQ), and to assess the effect of substituting/imputing missing values on dietary intake levels in the Norwegian Women and Cancer study (NOWAC). We have adapted and probably for the first time applied k nearest neighbours (KNN) imputation to FFQ data.

DESIGN

Data from a recent reproducibility study were used. The FFQ was mailed twice (test-retest) about 3 months apart to the same subjects. Missing responses in the test FFQ were imputed using the null value (frequencies = null, amount = smallest), the sample mode, the sample median, KNN, and retest values.

SETTING

A methodological substudy of NOWAC, a national population-based cohort.

SUBJECTS

A random sample of 2000 women aged 46-75 years was drawn from the cohort in 2002 (response 75%). The imputation methods were compared for 1430 women who completed at least 50% of the test FFQ.

RESULTS

We imputed 16% missing values in the overall test data matrix. Compared to null value imputation, the largest differences in estimated dietary intake were seen for KNN, and for food items with a high proportion of missing. Imputation with retest values increased total energy intake, indicating that not all missing values are caused by respondents failing to specify no consumption, and that null value imputation may lead to underestimation and misclassification.

CONCLUSION

Missing values in FFQs present a methodological challenge. We encourage the application and evaluation of newer imputation methods, including KNN, which may reduce imputation errors and give more accurate intake estimates.

摘要

目的

在挪威妇女与癌症研究(NOWAC)中,调查邮寄式食物频率问卷(FFQ)中的项目无应答情况,并评估替代/插补缺失值对饮食摄入量水平的影响。我们对FFQ数据采用了k最近邻法(KNN)插补法,这可能是首次应用。

设计

使用近期重复性研究的数据。将FFQ分两次(重测)邮寄给同一受试者,间隔约3个月。测试FFQ中的缺失应答采用空值(频率=空值,数量=最小值)、样本众数、样本中位数、KNN和重测值进行插补。

背景

NOWAC的一项方法学子研究,这是一项基于全国人群的队列研究。

对象

2002年从该队列中随机抽取2000名年龄在46 - 75岁的女性(应答率75%)。对至少完成50%测试FFQ的1430名女性的插补方法进行比较。

结果

我们对总体测试数据矩阵中的16%缺失值进行了插补。与空值插补相比,KNN以及缺失比例高的食物项目在估计饮食摄入量方面差异最大。用重测值插补增加了总能量摄入量,这表明并非所有缺失值都是由于受访者未明确表示未食用造成的,空值插补可能导致低估和错误分类。

结论

FFQ中的缺失值带来了方法学挑战。我们鼓励应用和评估更新的插补方法,包括KNN,这可能会减少插补误差并给出更准确的摄入量估计。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验