Nelson Kerrie P, Mitani Aya A, Edwards Don
Department of Biostatistics, Boston University, Boston, MA, 02118, USA.
Department of Statistics, University of South Carolina, Columbia, SC, 29205, USA.
Biom J. 2018 May;60(3):639-656. doi: 10.1002/bimj.201700078. Epub 2018 Jan 19.
Large-scale agreement studies are becoming increasingly common in medical settings to gain better insight into discrepancies often observed between experts' classifications. Ordered categorical scales are routinely used to classify subjects' disease and health conditions. Summary measures such as Cohen's weighted kappa are popular approaches for reporting levels of association for pairs of raters' ordinal classifications. However, in large-scale studies with many raters, assessing levels of association can be challenging due to dependencies between many raters each grading the same sample of subjects' results and the ordinal nature of the ratings. Further complexities arise when the focus of a study is to examine the impact of rater and subject characteristics on levels of association. In this paper, we describe a flexible approach based upon the class of generalized linear mixed models to assess the influence of rater and subject factors on association between many raters' ordinal classifications. We propose novel model-based measures for large-scale studies to provide simple summaries of association similar to Cohen's weighted kappa while avoiding prevalence and marginal distribution issues that Cohen's weighted kappa is susceptible to. The proposed summary measures can be used to compare association between subgroups of subjects or raters. We demonstrate the use of hypothesis tests to formally determine if rater and subject factors have a significant influence on association, and describe approaches for evaluating the goodness-of-fit of the proposed model. The performance of the proposed approach is explored through extensive simulation studies and is applied to a recent large-scale cancer breast cancer screening study.
在医学环境中,大规模一致性研究正变得越来越普遍,以便更好地洞察专家分类之间经常观察到的差异。有序分类量表通常用于对受试者的疾病和健康状况进行分类。诸如科恩加权kappa系数等汇总指标是报告评分者对受试者结果进行序数分类的关联程度的常用方法。然而,在有许多评分者的大规模研究中,由于许多评分者各自对相同受试者结果样本进行评分且评分具有序数性质,评估关联程度可能具有挑战性。当研究重点是检查评分者和受试者特征对关联程度的影响时,会出现进一步的复杂性。在本文中,我们描述了一种基于广义线性混合模型类的灵活方法,以评估评分者和受试者因素对许多评分者的序数分类之间关联的影响。我们为大规模研究提出了基于模型的新指标,以提供类似于科恩加权kappa系数的关联简单汇总,同时避免科恩加权kappa系数易受影响的患病率和边际分布问题。所提出的汇总指标可用于比较受试者或评分者亚组之间的关联。我们展示了使用假设检验来正式确定评分者和受试者因素是否对关联有显著影响,并描述了评估所提出模型拟合优度的方法。通过广泛的模拟研究探索了所提出方法的性能,并将其应用于最近的一项大规模乳腺癌筛查研究。