Brown Richard J C
Analytical Science Group, National Physical Laboratory, Teddington, Middlesex, UK.
Analyst. 2007 Apr;132(4):344-9. doi: 10.1039/b618255k. Epub 2007 Feb 14.
This study shows for the first time the effectiveness of Zipf's law in screening analytical data sets for outliers, data formatting and data transcription errors, particularly when the data sets are small. In the case of pollutant concentrations in ambient air, the multivariate nature of the measurement, and the relationship between the measured values of these multivariant quantities are the characteristics that allow a Zipf's law approach to data screening to be successful. Furthermore, it has been shown that Zipf's law has advantages over other novel data screening techniques, such as Benford's law, in terms of sensitivity and scope.
本研究首次展示了齐普夫定律在筛选分析数据集以查找异常值、数据格式化和数据转录错误方面的有效性,特别是当数据集较小时。对于环境空气中的污染物浓度而言,测量的多变量性质以及这些多变量量的测量值之间的关系是使得采用齐普夫定律方法进行数据筛选能够成功的特征。此外,研究表明,在灵敏度和范围方面,齐普夫定律比其他新型数据筛选技术(如本福特定律)具有优势。