Suppr超能文献

本福特定律与分析数据筛选:以环境空气中污染物浓度为例

Benford's Law and the screening of analytical data: the case of pollutant concentrations in ambient air.

作者信息

Brown Richard J C

机构信息

Analytical Science Group, National Physical Laboratory, Teddington, Middlesex, UKTW11 0LW.

出版信息

Analyst. 2005 Sep;130(9):1280-5. doi: 10.1039/b504462f. Epub 2005 Jul 26.

Abstract

The need to ensure the robustness of very large data sets produced by analytical measurement processes is increasing. This requires data screening techniques that can identify formatting or transcription errors in large data sets, that have undergone multiple data-handling and manipulation procedures. The empirical observation that the digits 1 to 9 are not equally likely to appear as the initial digit in multi-digit numbers is known as Benford's Law, and may provide a solution to this requirement. Several sets of data pertaining to the measured concentrations of pollutants in ambient air in the UK in 2004 have been analysed for their initial digit frequencies in order to assess the potential for the use of Benford's Law as a data screening, and authenticity-checking, tool for these types of analytical data sets. Benford's Law has been shown to be a robust top-level data screening tool provided that the numerical range of the data set being considered is four orders of magnitude or greater. It has been shown that small changes in the deviation of a data set from Benford's Law may indicate the introduction of errors during data processing. In this way, Benford's Law provides a sensitive technique for identifying data mishandling in large data sets.

摘要

确保分析测量过程产生的超大型数据集的稳健性的需求日益增长。这就需要数据筛选技术,这些技术要能识别经过多次数据处理和操作程序的大型数据集中的格式或转录错误。数字1到9在多位数中作为首位数字出现的可能性并不相同,这一经验观察结果被称为本福特定律,它可能为这一需求提供解决方案。为了评估将本福特定律用作这类分析数据集的数据筛选和真实性检查工具的潜力,对2004年英国环境空气中污染物测量浓度的几组数据的首位数字频率进行了分析。研究表明,只要所考虑的数据集的数值范围为四个数量级或更大,本福特定律就是一种稳健的顶级数据筛选工具。研究还表明,数据集与本福特定律的偏差出现微小变化,可能表明在数据处理过程中引入了错误。通过这种方式,本福特定律为识别大型数据集中的数据处理不当提供了一种灵敏的技术。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验