Abellana Sangra Rosa, Farran Codina Andreu
Department of Public Health, Faculty of Medicine, University of Barcelona..
Department of Nutrition and Food Science, Faculty of Pharmacy, University of Barcelona. Spain..
Nutr Hosp. 2015 Feb 26;31 Suppl 3:189-95. doi: 10.3305/nh.2015.31.sup3.8766.
When performing nutritional epidemiology studies, missing values and outliers inevitably appear. Missing values appear, for example, because of the difficulty in collecting data in dietary surveys, leading to a lack of data on the amounts of foods consumed or a poor description of these foods. Inadequate treatment during the data processing stage can create biases and loss of accuracy and, consequently, misinterpretation of the results. The objective of this article is to provide some recommendations about the treatment of missing and outlier data, and orientation regarding existing software for the determination of sample sizes and for performing statistical analysis. Some recommendations about data collection are provided as an important previous step in any nutritional research. We discuss methods used for dealing with missing values, especially the case deletion method, simple imputation and multiple imputation, with indications and examples. Identification, impact on statistical analysis and options available for adequate treatment of outlier values are explained, including some illustrative examples. Finally, the current software that totally or partially addresses the questions treated is mentioned, especially the free software available.
在进行营养流行病学研究时,缺失值和异常值不可避免地会出现。例如,缺失值的出现是由于饮食调查中数据收集困难,导致缺乏所食用食物量的数据或对这些食物的描述不佳。在数据处理阶段处理不当会产生偏差并导致准确性丧失,进而造成对结果的错误解读。本文的目的是提供一些关于缺失值和异常值数据处理的建议,以及关于现有用于确定样本量和进行统计分析的软件的指导。作为任何营养研究的重要前期步骤,还提供了一些关于数据收集的建议。我们讨论了处理缺失值的方法,特别是案例删除法、简单插补和多重插补,并给出了说明和示例。解释了异常值的识别、对统计分析的影响以及适当处理异常值的可用选项,包括一些示例。最后,提到了目前完全或部分解决所讨论问题的软件,特别是可用的免费软件。