Hayes Kevin, Kinsella Anthony, Coffey Norma
Department of Mathematics and Statistics, University of Limerick, Limerick, Republic of Ireland.
Clin Biochem. 2007 Feb;40(3-4):147-52. doi: 10.1016/j.clinbiochem.2006.08.019. Epub 2006 Oct 19.
This paper examines the pitfalls that arise when an outlier is assessed using a criterion based on a fixed multiple of the standard deviation rather than an established statistical test. Although the former approach is statistically invalid, it is the favored method for identifying outliers in Ontario laboratory quality control protocols.
Computer simulations are used to calculate the probability of a false positive result (classifying a valid observation as an outlier) when outlier criteria based on fixed multiples of the standard deviation are applied to samples containing no outliers.
The estimated probability of a false positive result is tabulated over various sample sizes. Outlier criteria based on fixed multiples of the standard deviation are shown to be highly inefficient.
This work presents arguments for discontinuing the widespread practice of using outlier criteria based on fixed multiples of the standard deviation to identify outliers in univariate samples.
本文探讨了在使用基于标准差固定倍数的标准而非既定统计检验来评估异常值时所出现的陷阱。尽管前一种方法在统计学上是无效的,但它却是安大略省实验室质量控制方案中识别异常值的常用方法。
运用计算机模拟来计算当基于标准差固定倍数的异常值标准应用于不含异常值的样本时出现假阳性结果(将有效观测值分类为异常值)的概率。
列出了在各种样本量下估计的假阳性结果概率。基于标准差固定倍数的异常值标准被证明效率极低。
本文提出理由,主张停止广泛使用基于标准差固定倍数的异常值标准来识别单变量样本中的异常值这一做法。