Kwak Sang Kyu, Kim Jong Hae
Department of Medical Statistics, School of Medicine, Catholic University of Daegu, Daegu, Korea.
Department of Anesthesiology and Pain Medicine, School of Medicine, Catholic University of Daegu, Daegu, Korea.
Korean J Anesthesiol. 2017 Aug;70(4):407-411. doi: 10.4097/kjae.2017.70.4.407. Epub 2017 Jul 27.
Missing values and outliers are frequently encountered while collecting data. The presence of missing values reduces the data available to be analyzed, compromising the statistical power of the study, and eventually the reliability of its results. In addition, it causes a significant bias in the results and degrades the efficiency of the data. Outliers significantly affect the process of estimating statistics (, the average and standard deviation of a sample), resulting in overestimated or underestimated values. Therefore, the results of data analysis are considerably dependent on the ways in which the missing values and outliers are processed. In this regard, this review discusses the types of missing values, ways of identifying outliers, and dealing with the two.
在收集数据时经常会遇到缺失值和异常值。缺失值的存在减少了可用于分析的数据量,损害了研究的统计效力,并最终影响其结果的可靠性。此外,它还会导致结果出现显著偏差,并降低数据的效率。异常值会显著影响统计估计过程(例如样本的均值和标准差),导致值被高估或低估。因此,数据分析的结果在很大程度上取决于处理缺失值和异常值的方式。在这方面,本综述讨论了缺失值的类型、识别异常值的方法以及处理这两者的方法。