Fang Mengyu, Hutson Alan David, Yu Han
Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY 14263, USA.
Cancers (Basel). 2025 Aug 21;17(16):2713. doi: 10.3390/cancers17162713.
Inter-rater reliability is critical in oncology to ensure consistent and reliable measurements across raters and methods, such as when evaluating biomarker levels in different laboratories or comparing tumor size assessments by radiation oncologists during therapy planning. This consistency is essential for informed decision-making in both clinical and research contexts, and the intraclass correlation coefficient (ICC) is a widely recommended statistic for assessing agreement. This work focuses on hypothesis testing of the ICC(2,1) with two raters. We evaluated the performance of a naive permutation test for testing the hypothesis H0:ICC=0 and found that it fails to reliably control the type I error rate. To address this, we developed a robust permutation test based on a studentized statistic, which we prove to be asymptotically valid even when paired variables are uncorrelated but dependent. Simulation studies demonstrate that the proposed test consistently maintains type I error control, even with small sample sizes, outperforming the naive approach across various data-generating scenarios. The proposed studentized permutation test for ICC(2,1) offers a statistically valid and robust method for assessing inter-rater reliability and demonstrates practical utility when applied to two real-world oncology datasets.
在肿瘤学中,评分者间信度至关重要,以确保不同评分者和方法之间测量结果的一致性和可靠性,例如在评估不同实验室的生物标志物水平或在治疗计划期间比较放射肿瘤学家对肿瘤大小的评估时。这种一致性对于临床和研究背景下的明智决策至关重要,而组内相关系数(ICC)是评估一致性时广泛推荐的统计量。这项工作聚焦于两名评分者情况下ICC(2,1)的假设检验。我们评估了用于检验原假设H0:ICC = 0的简单置换检验的性能,发现它未能可靠地控制第一类错误率。为了解决这个问题,我们基于学生化统计量开发了一种稳健的置换检验,我们证明即使配对变量不相关但相依时,该检验在渐近意义上也是有效的。模拟研究表明,所提出的检验即使在样本量较小时也能持续保持对第一类错误的控制,在各种数据生成场景下均优于简单方法。所提出的针对ICC(2,1)的学生化置换检验为评估评分者间信度提供了一种统计上有效且稳健的方法,并在应用于两个实际肿瘤学数据集时展示了实际效用。