有序评分中评分者间变异性的统计描述。

Statistical description of interrater variability in ordinal ratings.

作者信息

Nelson J C, Pepe M S

机构信息

Department of Biostatistics, University of Washington, Box 357232, F-600 Health Sciences Building, 1705 NE Pacific, Seattle, WA 98195-7232, USA.

出版信息

Stat Methods Med Res. 2000 Oct;9(5):475-96. doi: 10.1177/096228020000900505.

DOI:10.1177/096228020000900505

PMID:11191261

Abstract

Ordinal categorical assessments are common in medical practice and in research. Variability in such measurements amongst raters making the assessments can be problematic. In this paper we consider how such variability can be described statistically. We review three current approaches, including kappa-type statistics, loglinear models for agreement, and latent class agreement models, and discuss their limitations. We present a new graphical approach to describing interrater variability that involves a simple frequency distribution display of the category probabilities. The method enables description of interrater variability when raters are a random sample from some population as opposed to the traditional setting in which only a few selected raters provide assessments. Advantages of this approach relative to current approaches include the following: (1) it provides a simple visual summary of the rating data, (2) description is closely linked to familiar methods for describing variability in continuous measurements, (3) interpretation is straightforward, and (4) a large sample of raters can be accommodated with ease. We illustrate the method on simulated ordinal data representing radiologists' ratings of mammography images and on rating data from a national image reading study of mammography screening.

摘要

有序分类评估在医学实践和研究中很常见。进行评估的评分者之间此类测量的变异性可能会带来问题。在本文中，我们考虑如何从统计学角度描述这种变异性。我们回顾了三种当前的方法，包括kappa型统计、一致性对数线性模型和潜在类别一致性模型，并讨论了它们的局限性。我们提出了一种描述评分者间变异性的新图形方法，该方法涉及类别概率的简单频率分布展示。当评分者是来自某个总体的随机样本时，该方法能够描述评分者间的变异性，这与传统情况不同，传统情况下只有少数选定的评分者进行评估。相对于当前方法，这种方法的优点包括：（1）它提供了评分数据的简单直观总结；（2）描述与用于描述连续测量变异性的熟悉方法紧密相关；（3）解释直接明了；（4）可以轻松容纳大量评分者样本。我们用代表放射科医生对乳腺X线摄影图像评分的模拟有序数据以及来自一项全国性乳腺X线摄影筛查图像解读研究的评分数据来说明该方法。