参考分布中异常值的检测：霍恩算法的性能

Detection of outliers in reference distributions: performance of Horn's algorithm.

作者信息

Solberg Helge Erik, Lahti Ari

机构信息

Department of Medical Biochemistry, Rikshospitalet-Radiumhospitalet HF, Oslo, Norway.

出版信息

Clin Chem. 2005 Dec;51(12):2326-32. doi: 10.1373/clinchem.2005.058339. Epub 2005 Oct 13.

DOI:10.1373/clinchem.2005.058339

PMID:16223885

Abstract

BACKGROUND

Medical laboratory reference data may be contaminated with outliers that should be eliminated before estimation of the reference interval. A statistical test for outliers has been proposed by Paul S. Horn and coworkers (Clin Chem 2001;47:2137-45). The algorithm operates in 2 steps: (a) mathematically transform the original data to approximate a gaussian distribution; and (b) establish detection limits (Tukey fences) based on the central part of the transformed distribution.

METHODS

We studied the specificity of Horn's test algorithm (probability of false detection of outliers), using Monte Carlo computer simulations performed on 13 types of probability distributions covering a wide range of positive and negative skewness. Distributions with 3% of the original observations replaced by random outliers were used to also examine the sensitivity of the test (probability of detection of true outliers). Three data transformations were used: the Box and Cox function (used in the original Horn's test), the Manly exponential function, and the John and Draper modulus function.

RESULTS

For many of the probability distributions, the specificity of Horn's algorithm was rather poor compared with the theoretical expectation. The cause for such poor performance was at least partially related to remaining nongaussian kurtosis (peakedness). The sensitivity showed great variation, dependent on both the type of underlying distribution and the location of the outliers (upper and/or lower tail).

CONCLUSION

Although Horn's algorithm undoubtedly is an improvement compared with older methods for outlier detection, reliable statistical identification of outliers in reference data remains a challenge.

摘要

背景

医学实验室参考数据可能会受到异常值的污染，在估计参考区间之前应将其剔除。Paul S. Horn及其同事提出了一种用于检测异常值的统计检验方法（《临床化学》2001年；47：2137 - 45）。该算法分两步运行：(a) 对原始数据进行数学变换，使其近似高斯分布；(b) 根据变换后分布的中心部分确定检测限（Tukey界限）。

方法

我们使用蒙特卡罗计算机模拟研究了Horn检验算法的特异性（误检异常值的概率），模拟针对13种概率分布进行，涵盖了广泛的正负偏度范围。用3%的原始观测值被随机异常值替换后的分布来检验该检验的灵敏度（检测真实异常值的概率）。使用了三种数据变换：Box和Cox函数（用于原始的Horn检验）、Manly指数函数以及John和Draper模量函数。

结果

对于许多概率分布，与理论预期相比，Horn算法的特异性相当差。这种不佳表现的原因至少部分与剩余的非高斯峰度（尖峰性）有关。灵敏度表现出很大差异，这取决于基础分布的类型以及异常值的位置（上尾和/或下尾）。

结论

尽管与旧的异常值检测方法相比，Horn算法无疑是一种改进，但在参考数据中可靠地统计识别异常值仍然是一个挑战。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

参考分布中异常值的检测：霍恩算法的性能

Detection of outliers in reference distributions: performance of Horn's algorithm.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

参考分布中异常值的检测：霍恩算法的性能

Detection of outliers in reference distributions: performance of Horn's algorithm.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献