Valet Fabien, Guinot Christiane, Ezzedine Khaled, Mary Jean-Yves
Inserm U717, Département de Biostatistique et Informatique Médicale, Saint-Louis Hospital, 1 Avenue Claude Vellefaux, F-75010 Paris, University Paris 7, France.
J Clin Epidemiol. 2008 Oct;61(10):983-90. doi: 10.1016/j.jclinepi.2007.11.004. Epub 2008 May 27.
In health research, ordinal scales are extensively used. Reproducibility of ratings using these scales is important to assess their quality. This study aimed to compare two methods analyzing reproducibility: weighted Kappa statistic and log-linear models.
Contributions of each method to the reproducibility assessment of ratings using ordinal scales were compared using intra- and interobserver data chosen in three different fields: Crow's feet scale in dermatology, dysplasia scale in oncology, updated Sydney scale in gastroenterology.
Both methods provided an agreement level. In addition, log-linear models allowed evaluation of the structure of agreement. For the Crow's feet scale, both methods gave equivalent high agreement levels. For the dysplasia scale, log-linear models highlighted scale defects and Kappa statistic showed a moderate agreement. For the updated Sydney scale, log-linear models underlined a null distinguishability between two adjacent categories, whereas Kappa statistic gave a high global agreement level.
Methods that can investigate level and structure of agreement between ordinal ratings are valuable tools, since they may highlight heterogeneities within the scales structure and suggest modifications to improve their reproducibility.
在健康研究中,序数量表被广泛使用。使用这些量表进行评分的可重复性对于评估其质量很重要。本研究旨在比较两种分析可重复性的方法:加权Kappa统计量和对数线性模型。
使用在三个不同领域中选择的观察者内和观察者间数据,比较了每种方法对使用序数量表进行评分的可重复性评估的贡献:皮肤科的鱼尾纹量表、肿瘤学中的发育异常量表、胃肠病学中的更新悉尼量表。
两种方法都提供了一致性水平。此外,对数线性模型允许对一致性结构进行评估。对于鱼尾纹量表,两种方法给出了相当高的一致性水平。对于发育异常量表,对数线性模型突出了量表缺陷,而Kappa统计量显示出中等一致性。对于更新悉尼量表,对数线性模型强调了两个相邻类别之间的零区分性,而Kappa统计量给出了较高的总体一致性水平。
能够研究序数量表评分之间一致性水平和结构的方法是有价值的工具,因为它们可能突出量表结构内的异质性,并建议进行修改以提高其可重复性。