Barnhart Huiman X, Haber Michael J, Lin Lawrence I
Department of Biostatistics and Bioinformatics and Duke Clinical Research Institute, Duke University, Durham, North Carolina 27715, USA.
J Biopharm Stat. 2007;17(4):529-69. doi: 10.1080/10543400701376480.
Reliable and accurate measurements serve as the basis for evaluation in many scientific disciplines. Issues related to reliable and accurate measurement have evolved over many decades, dating back to the nineteenth century and the pioneering work of Galton (1886), Pearson (1896, 1899, 1901), and Fisher (1925). Requiring a new measurement to be identical to the truth is often impractical, either because (1) we are willing to accept a measurement up to some tolerable (or acceptable) error, or (2) the truth is simply not available to us, either because it is not measurable or is only measurable with some degree of error. To deal with issues related to both (1) and (2), a number of concepts, methods, and theories have been developed in various disciplines. Some of these concepts have been used across disciplines, while others have been limited to a particular field but may have potential uses in other disciplines. In this paper, we elucidate and contrast fundamental concepts employed in different disciplines and unite these concepts into one common theme: assessing closeness (agreement) of observations. We focus on assessing agreement with continuous measurements and classify different statistical approaches as (1) descriptive tools; (2) unscaled summary indices based on absolute differences of measurements; and (3) scaled summary indices attaining values between -1 and 1 for various data structures, and for cases with and without a reference. We also identify gaps that require further research and discuss future directions in assessing agreement.
可靠且准确的测量是许多科学学科评估的基础。与可靠且准确测量相关的问题已经演变了数十年,可以追溯到19世纪以及高尔顿(1886年)、皮尔逊(1896年、1899年、1901年)和费希尔(1925年)的开创性工作。要求新的测量与真值完全相同通常是不切实际的,这要么是因为(1)我们愿意接受在某个可容忍(或可接受)误差范围内的测量,要么是因为(2)真值对我们来说根本无法获取,这要么是因为它不可测量,要么是只能在一定误差程度下测量。为了处理与(1)和(2)相关的问题,各个学科已经开发了许多概念、方法和理论。其中一些概念已在跨学科中使用,而其他一些则仅限于特定领域,但可能在其他学科中有潜在用途。在本文中,我们阐明并对比不同学科中使用的基本概念,并将这些概念统一为一个共同主题:评估观测值的接近程度(一致性)。我们专注于评估与连续测量的一致性,并将不同的统计方法分类为:(1)描述性工具;(2)基于测量绝对差异的无标度汇总指标;(3)针对各种数据结构以及有和没有参考值的情况,取值在 -1 到 1 之间的标度汇总指标。我们还确定了需要进一步研究的差距,并讨论了评估一致性的未来方向。