Nelson Kerrie P, Zhou Thomas J
Department of Biostatistics, Boston University, Boston, Massachusetts, USA.
Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Springs, Maryland, USA.
Stat Med. 2025 Jul;44(15-17):e70141. doi: 10.1002/sim.70141.
Cohen's kappa and other summary measures are often used in clinical studies to describe agreement and association between two experts' ordered categorical ratings. However, a key limitation of Cohen's kappa and similar measures is their inability to evaluate the impact of patient-related factors such as family history and age on the agreement and association between experts. Strong agreement between experts is an essential component of effective clinical procedures where subjective interpretation of patients' images or test results by an expert is required, for example, in the visual assessment of breast density from a mammogram. Not accounting for important patient-related factors can lead to inflated and biased assessments of agreement and association. In this article, our objective is to propose novel model-based measures that appropriately account for the impact of patient-related covariates on chance-corrected agreement and association between two experts' ordinal ratings that overcome limitations of existing measures. Our population-based approach is based on an ordinal generalized linear mixed model (GLMM). Rigorous simulation studies evaluating performance of the new model-based measures in a broad range of settings are reported. Existing and new measures are compared in two clinical applications assessing breast density and multiple sclerosis. Key advantages of the new kappa measures over existing measures such as Cohen's kappa include incorporating patient-related factors, robustness to underlying disease prevalence and marginal distributions of experts' ratings, and appropriately correcting for chance agreement. Sample R code is provided by the authors for application of proposed measures in other studies.
科恩kappa系数及其他汇总指标常用于临床研究,以描述两位专家的有序分类评级之间的一致性和关联性。然而,科恩kappa系数及类似指标的一个关键局限性在于,它们无法评估家族病史和年龄等患者相关因素对专家之间一致性和关联性的影响。专家之间的高度一致性是有效临床程序的重要组成部分,例如在通过乳房X光片进行乳房密度视觉评估时,需要专家对患者图像或检测结果进行主观解读。不考虑重要的患者相关因素可能导致对一致性和关联性的评估过高且有偏差。在本文中,我们的目标是提出基于模型的新指标,以适当考虑患者相关协变量对两位专家序数评级之间的机遇校正一致性和关联性的影响,从而克服现有指标的局限性。我们基于人群的方法基于序数广义线性混合模型(GLMM)。报告了在广泛场景下评估新的基于模型指标性能的严格模拟研究。在评估乳房密度和多发性硬化症的两项临床应用中,对现有指标和新指标进行了比较。新的kappa指标相对于科恩kappa系数等现有指标的主要优势包括纳入患者相关因素、对基础疾病患病率和专家评级的边缘分布具有稳健性,以及对机遇一致性进行适当校正。作者提供了示例R代码,以便在其他研究中应用所提出的指标。