Chemistry Research Labs, Drug Discovery Research, Astellas Pharma Inc., 21 Miyukigaoka, Tsukuba, Ibaraki 305-8585, Japan.
Drug Discov Today. 2010 May;15(9-10):328-31. doi: 10.1016/j.drudis.2010.03.003. Epub 2010 Mar 16.
Benford's law states that the distribution of the first digit of many data sets is not uniform. The first digit of any random number will be 1 almost 30% of the time, and larger digits occur as the first digit with lower and lower frequency, to the point where 9 occurs as a first digit only 5% of the time. Here, we demonstrate that several data sets in the field of drug discovery follow Benford's distribution, whereas 'doctored' data do not. Our findings indicate the applicability of Benford's law in assessing data quality in the field of drug discovery. We also propose a useful index of evaluating data quality based on Benford's law.
本福德定律指出,许多数据集的首位数字分布并不均匀。任何随机数的首位数字几乎有 30%的时间是 1,而较大的数字作为首位数字的出现频率则越来越低,到了 9 作为首位数字的情况仅占 5%。在这里,我们证明了药物发现领域的几个数据集符合本福德定律,而“篡改”的数据则不符合。我们的发现表明,本福德定律可用于评估药物发现领域的数据质量。我们还提出了一个基于本福德定律评估数据质量的有用指标。