Fancourt Hayley S M, Stephan Carl N
Laboratory for Human Craniofacial and Skeletal Identification (HuCS-ID Lab), School of Biomedical Sciences, The University of Queensland, Brisbane 4072, Australia.
Laboratory for Human Craniofacial and Skeletal Identification (HuCS-ID Lab), School of Biomedical Sciences, The University of Queensland, Brisbane 4072, Australia.
Forensic Sci Int. 2018 Apr;285:162-171. doi: 10.1016/j.forsciint.2018.02.008. Epub 2018 Feb 21.
For measurements to be accurate and precise, measurement errors should be small. In the anthropometry and craniofacial identification literature, four methods are commonly used for assessing measurement error: Pearson's product moment correlation coefficient (r), intra-class correlation coefficients (ICC), statistical significance tests (often reported by P-values) and the technical error of measurement (TEM; also known as Dalberg's error/ratio). In this paper, the performance of all four of these statistics were evaluated using maximum cranial lengths (g-op) from Howells (n=2524), by duplicating the dataset and mathematically adding known degrees of error to the second set. This was repeated under a broad array of trials (2000 total) each with slightly different amounts of error simulation to comprehensively assess the four error metrics in terms of descriptive power and utility, using the same data for each of the four error assessment methods. Data simulations included the addition of random and systematic errors of different sizes with absolute differences ranging from 1 to 50mm (or in relative terms, 28% of the original measurement). Two sample sizes (n=25 and 2524 individuals) were explored and all analyses were conducted in R. P-values from Student's t-tests only showed significant differences (P<0.05) for the larger sample size when the error was systematic. Small samples, and/or any with random error, did not yield low or significant P-values (P<0.05). When raw differences were <4mm for 95% of the sample (n=2524), the ICC and r were high (>0.97) and remained so even after tripling the error, such that 95% of the sample possessed raw differences up to 12mm (r=0.8). In contrast, the TEM was low initially (<2mm or r-TEM<1%), and then increased (<4.5mm and 2.5%, TEM and r-TEM respectively). These data show that P-values, ICC and r values hold substantial limits for error description as they do not always flag error well. In contrast, TEM appears to covary with error more saliently and holds the advantage that changes are reported in the units of the original measurement. For these reasons, TEM is recommended in favour to P-values, ICC and r.
为使测量准确且精确,测量误差应较小。在人体测量学和颅面识别文献中,通常使用四种方法来评估测量误差:皮尔逊积矩相关系数(r)、组内相关系数(ICC)、统计显著性检验(通常用P值报告)以及测量技术误差(TEM;也称为达尔伯格误差/比率)。在本文中,使用豪威尔斯的最大颅长(g-op)(n = 2524)对这四种统计方法的性能进行了评估,方法是复制数据集,并在数学上给第二组数据添加已知程度的误差。在一系列广泛的试验(总共2000次)中重复此操作,每次试验的误差模拟量略有不同,以便使用四种误差评估方法中的每一种都使用相同的数据,从描述能力和实用性方面全面评估这四种误差指标。数据模拟包括添加不同大小的随机误差和系统误差,绝对差值范围为1至50毫米(或相对而言,为原始测量值的28%)。研究了两种样本量(n = 25和2524个人),所有分析均在R中进行。来自学生t检验的P值仅在误差为系统性且样本量较大时显示出显著差异(P < 0.05)。小样本以及/或者任何存在随机误差的样本,都没有产生低的或显著的P值(P < 0.05)。当95%的样本(n = 2524)的原始差值<4毫米时,ICC和r很高(>0.97),即使将误差增加两倍后仍然如此,以至于95%的样本的原始差值高达12毫米(r = 0.8)。相比之下,TEM最初较低(<2毫米或r-TEM<1%),然后增加(分别为<4.5毫米和2.5%,TEM和r-TEM)。这些数据表明,P值、ICC和r值在误差描述方面存在很大局限性,因为它们并不总是能很好地标记误差。相比之下,TEM似乎与误差的相关性更显著,并且具有以原始测量单位报告变化的优势。出于这些原因,推荐使用TEM而非P值、ICC和r。