De Vocht Frank, Kromhout Hans
Centre for Occupational and Environmental Health, School of Community Based Medicine, Manchester Academic Health Science Centre, The University of Manchester, Ellen Wilkinson Building, Oxford Road, Manchester M13 9PL, UK.
Ann Occup Hyg. 2013 Apr;57(3):296-304. doi: 10.1093/annhyg/mes067. Epub 2012 Sep 20.
Benford's law is the contra-intuitive empirical observation that the digits 1-9 are not equally likely to appear as the initial digit in numbers resulting from the same phenomenon. Manipulated, unrelated, or created numbers usually do not follow Benford's law, and as such this law has been used in the investigation of fraudulent data in, for example, accounting and to identify errors in data sets due to, for example, data transfer. We describe the use of Benford's law to screen occupational hygiene measurement data sets using exposure data from the European rubber manufacturing industry as an illustration. Two rubber process dust measurement data sets added to the European Union ExAsRub project but initially collected by the UK Health and Safety Executive (HSE) and British Rubber Manufacturers' Association (BRMA) and one pre- and one post-treatment n-nitrosamines data set collated in the German MEGA database and also added to the ExAsRub database were compared with the expected first-digit (1BL) and second-digit (2BL) Benford distributions. Evaluation indicated only small deviations from the expected 1BL and 2BL distributions for the data sets collated by the UK HSE and industry (BRMA), respectively, while for the MEGA data larger deviations were observed. To a large extent the latter could be attributed to imputation and replacement by a constant of n-nitrosamine measurements below the limit of detection, but further evaluation of these data to determine why other deviations from 1BL and 2BL expected distributions exist may be beneficial. Benford's law is a straightforward and easy-to-implement analytical tool to evaluate the quality of occupational hygiene data sets, and as such can be used to detect potential problems in large data sets that may be caused by malcontent a priori or a posteriori manipulation of data sets and by issues like treatment of observations below the limit of detection, rounding and transfer of data.
本福特定律是一种与直觉相悖的经验观察结果,即数字1 - 9在源于同一现象的数字中作为首位数字出现的可能性并不相同。经过人为操纵、不相关或编造的数字通常不遵循本福特定律,因此该定律已被用于调查例如会计领域中的欺诈数据,以及识别数据集中因数据传输等原因导致的错误。我们以欧洲橡胶制造业的暴露数据为例,描述了如何使用本福特定律来筛选职业卫生测量数据集。将添加到欧盟ExAsRub项目但最初由英国健康与安全执行局(HSE)和英国橡胶制造商协会(BRMA)收集的两个橡胶工艺粉尘测量数据集,以及整理在德国MEGA数据库中并添加到ExAsRub数据库的一个处理前和一个处理后的N - 亚硝胺数据集,与预期的本福特首位数字(1BL)和第二位数字(2BL)分布进行了比较。评估表明,英国HSE和行业(BRMA)整理的数据集分别与预期的1BL和2BL分布仅有小偏差,而MEGA数据则观察到较大偏差。在很大程度上,后者可归因于对低于检测限的N - 亚硝胺测量值进行插补和用常数替换,但进一步评估这些数据以确定为何存在与1BL和2BL预期分布的其他偏差可能会有所帮助。本福特定律是一种简单且易于实施的分析工具,可用于评估职业卫生数据集的质量,因此可用于检测大数据集中可能由先验或后验数据集操纵以及低于检测限的观测值处理、数据舍入和传输等问题引起的潜在问题。